# This block of code will create a blank SPI dataset (only containing country info) that will be appended to when each indicator is added.
# There will be two indicators added for each dimension
# 1. An indicator with a score between 0-1 for each dimension
# 2. An indicator with the raw (unscored) values of the indicators
# The unit for this database will be country*year
span <- c(2004:2019)
spi_df_empty <- bind_rows(replicate(length(span), wbstats::wbcountries(), simplify = FALSE), .id='date') %>%
mutate(date=as.numeric(date)+span[1]-1) %>%
filter(region!="Aggregates") # take out the aggregates (LAC, SAR, etc)
spi_df <- spi_df_empty
#read list of iso3c codes for matching from UN (https://unstats.un.org/unsd/methodology/m49/)
iso3c <- read_csv(paste(raw_dir,'metadata/iso_codes.csv', sep="/"),
col_types=list(col_character(), col_character(), col_character()))
In this program, I will clean and reshape the raw data files in the 01_rawdata folder in order to produce the Statistical Performance Indicators.
Below is an overview of the SPI framework.
# Handling missing data
It is likely that for the initial launch of the Dashboard there will be several types of missing data.
In the absence of any new data, the existing material collected for the Statistical Performance Index could be used to populate an initial dashboard as follows:
• Data use dimension: leave blank
• Data services dimension: use scores for SPI “dissemination practices and openness” dimension
• Data products dimension: use scores for SPI “availability of key indicators” dimension
• Data sources dimension: use scores for SPI “censuses and surveys” dimension
• Data infrastructure dimension: use scores for SPI “methodology, standard and classifications” dimension
Indeed, a first iteration Dashboard and toolkit can be demonstrated using this existing data. As new methods are developed and data are collected, the Dashboard can progressively be populated. Where it has not been possible to develop and implement a methodology for the dashboard tile concerned, this should be left blank to signify that this is an important issue that has not yet been quantified in a way that meets the criteria for the Dashboard. This will provide an incentive for a research agenda and possibly surveys and other mechanisms to fill the gap for a particular purpose eg a country assessment.
Where it has been possible to develop and implement a methodology but it has not been possible to find data for a specific country, even though it is known to exist, the Dashboard tile for that country should also be left blank and weighting between tiles conducted on the basis of the tiles that are present.
In some situations, missing data represents a zero score, for example if there is no evidence that geospatial data is used by the country’s statistical system. In such cases a score of zero should be recorded and used in the generation of indicators, weightings and rankings.
Here are a few concerns that I have at the moment, but this is by no means an exhaustive list. Overall, there are some indicators with issues in terms of coverage across countries. There are also differences in the length of the time series for each indicator. Here is a detailed breakdown by indicator.
Cleaning for Data Use Indicators. Data Use (5 Indicators):
- 1.1_DUNL - Indicator 1.1: Data use by national legislature
- 1.2_DUNE - Indicator 1.2: Data use by national executive branch
- 1.3_DUCS - Indicator 1.3: Data use by civil society
- 1.4_DUAC - Indicator 1.4: Data use by academia
- 1.5_DUIO - Indicator 1.5: Data use by international organizations
Based on PARIS21 data use indicator (Chapter 4 of Statistical Capacity Development Outlook) using national legislature website as source.
Based on PARIS21 data use indicator (Chapter 4 of Statistical Capacity Development Outlook) using national development plans and poverty reduction plans as a source.
Our main source for this is the FAOLEX database of legislation and policy documents maintained by the FAO.
Based on PARIS21 data use indicator (Chapter 4 of Statistical Capacity Development Outlook) using main social media platform in use in country as source.
We are pulling data from Twitter at this point. We have only accessed a handful of accounts from NSOs, but this could be quickly scaled up.
The idea for this indicator is that countries should be producing statistical products that are utilized, in this case by academia. As a measure of this, we use the IHSN database on censuses and surveys conducted by counties and calculate a ranking of countries based on the total number of downloads of the censuses and surveys produced by that country on a per capita basis. In other words, we calculate the total number of downloads of censuses and surveys for a country, divided by the population of that country, to produce a ranking. Countries with low scores on this metric, may either be producing censuses and surveys that are not in high demand or are not producing or adding a sufficient number of censuses and surveys to IHSN to register highly on this measure.
The source for this indicator is the IHSN.
The mission of the IHSN is to improve the availability, accessibility, and quality of survey data within developing countries, and to encourage the analysis and use of this data by national and international development decision makers, the research community, and other stakeholders.
To support this mission, the key objectives of the IHSN are:
Improved coordination of internationally sponsored survey programs, with emphasis on timing, sequencing, frequency, and cost-effectiveness Availability of coordinated and practical technical and methodological guidelines for all stages of the survey life cycle Availability of a central survey data catalog which would inform data users of the availability of survey and census data from multiple sources Availability of standards, tools, and guidelines that would allow data producers to document, disseminate, and preserve microdata according to international standards and best practices Improved collaboration between data producers and users
IHSN has a database of more than 7,483 surveys as of August 5, 2020.
Below is the list of AKI indicators:
AKI 1.5.1: Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)
AKI 1.5.2: Food Insecurity Experience Scale
AKI 1.5.3: Mortality rate, under-5 (per 1,000 live births)
AKI 1.5.4: Proportion of children and young people in grades 2 or 3 achieving at least a minimum proficiency level in reading and mathematics, by sex.
AKI 1.5.5: Maternal Mortality
AKI 1.5.6: People using safely managed drinking water services (% of population)
AKI 1.5.7: Access to electricity (% of population)
AKI 1.5.8: Unemployment, total (% of total labor force)
AKI 1.5.9: Manufacturing, value added (% of GDP)
AKI 1.5.10: Annualized average growth rate in per capita real survey mean consumption or income, bottom 40% of population (%)
AKI 1.5.11: Level of water stress: freshwater withdrawal as a proportion of available freshwater resources
AKI 1.5.12: Renewable energy consumption (% of total final energy consumption)
AKI 1.5.13: Households and NPISHs Final consumption expenditure (current LCU)
AKI 1.5.14: Quarterly GDP
AKI 1.5.15: Debt service (PPG and IMF only, % of exports of goods, services and primary income)
# pull in some WDI metadata which will be used for constructing AKI indicators.
# WDI metadata was gathered using previous vintages of WDI from 2016-19
# Metadata was gathered and saved as csv files in the 011_data folder
for (i in 2016:2019) {
temp <- read_csv(file = paste(raw_dir, "/metadata/WDI_metadata_",i,".csv", sep="" )) %>%
as_tibble(.name_repair='universal') %>%
mutate(National.accounts.reference.year=as.numeric(National.accounts.reference.year),
date=i)
assign(paste("WDI_metadata",i,sep="_"), temp)
}
WDI_metadata <- bind_rows(WDI_metadata_2016, WDI_metadata_2017, WDI_metadata_2018, WDI_metadata_2019)
First we will pull data for indicators coming straight from WDI. For some indicators, we will use alternative sources.
Indicators coming from the WDI directly are:
Scoring:
1 Point. 3 or more values available within past 5 years 0.6 Points. 2 values available within past 5 years;
0.3 Points. 1 values available within past 5 years;
0 Points. None within past 5 years
##########
# Pull Tags for WDI Data
##########
# make request to World Bank API
wdiRequest <- GET(url = "http://api.worldbank.org/v2/indicator?per_page=20000&format=json&source=2")
wdiResponse <- content(wdiRequest, as = "text", encoding = "UTF-8")
# Parse the JSON content and convert it to a data frame.
wdisJSON <- jsonlite::fromJSON(wdiResponse, flatten = TRUE) %>%
data.frame()
EdStatsRequest <- GET(url = "http://api.worldbank.org/v2/indicator?per_page=20000&format=json&source=12")
EdStatsResponse <- content(EdStatsRequest, as = "text", encoding = "UTF-8")
# Parse the JSON content and convert it to a data frame.
EdStatsJSON <- jsonlite::fromJSON(EdStatsResponse, flatten = TRUE) %>%
data.frame()
aki<- c(
'Pupils below minimum reading proficiency at end of primary (%). Low GAML threshold',
'People using safely managed drinking water services (% of population)',
'Unemployment, total (% of total labor force) (national estimate)' ,
'Manufacturing, value added (% of GDP)',
'Annualized average growth rate in per capita real survey mean consumption or income, bottom 40% of population (%)',
'Level of water stress: freshwater withdrawal as a proportion of available freshwater resources',
'Renewable energy consumption (% of total final energy consumption)',
'Households and NPISHs Final consumption expenditure (current LCU)',
'Debt service (PPG and IMF only, % of exports of goods, services and primary income)' )
get_tag_aki_df<-wdisJSON %>%
bind_rows(EdStatsJSON) %>%
filter((name %in% aki )) %>%
arrange(factor(name, levels = aki)) %>%
select(id, name, sourceOrganization)
#get WDI metadata infor
cache_list<-wbstats::wbcache()
country_list <- wbstats::wbcountries()
aki_list<-get_tag_aki_df[,'id']
for (reference_year in 2016:2019) {
temp <-wbstats::wb(country="countries_only",
indicator=aki_list,
startdate=reference_year-5,
enddate=reference_year,
return_wide = T,
removeNA=FALSE) %>%
filter(((reference_year-as.numeric(date))<=5) & (reference_year>=as.numeric(date))) %>% #filter out years outside reference window of 3 years
write_excel_csv(path = paste(output_dir, "/D3.AKI_data_pull_",reference_year,".csv", sep="" )) %>%
mutate_at(.vars=aki_list, ~if_else(is.na(.),0,1)) %>% #create 0,1 variable for whether data point exists for country
group_by(iso3c, country) %>%
summarise_all((~(if(is.numeric(.)) sum(., na.rm = TRUE) else first(.)))) %>% #group by country to create one observation per country containing whether or not data point existed
mutate_at(.vars=aki_list, ~case_when(
.>=3 ~ 1,
.==2 ~ 0.6,
.==1 ~ 0.3,
.==0 ~ 0,
TRUE ~ 0
)) %>% # 1 point for at least 3 values, 0.6 for 2 values, 0.3 for 1 values, 0 otherwise
mutate(date=reference_year) %>%
ungroup() %>%
select(iso3c, country, date, aki_list) %>%
rename_at(aki_list, ~(paste('SPI.D1.5.',.,sep=""))) %>% #add 'D3.' as prefix before these indicators.
left_join(country_list) #attach country metadata
assign(paste("D3.AKI",reference_year,sep="_"), temp)
}
The data will be pulled from the WDI and combined with metadata from the Povcalnet
Scoring is as follows: Quality (0.5 points total): 0.5 Point. Comparable data lasting at least two years within past 5 years
0 Points. No comparable data within past 5 years
Frequency (0.5 points total): 0.5 Point. 3 or more values available within past 5 years
0.3 Points. 2 values available within past 5 years;
0.15 Points. 1 values available within past 5 years;
0 Points. None within past 5 years
library(povcalnetR)
#the code below is based on public released code by povcalnet team:
#https://github.com/worldbank/Global_Poverty_Blogs/blob/master/bg_povcalnet_comparability/R/comparability_breaks.R
# some contants
year_range <- 1990:2020
metadata_path <- "https://development-data-hub-s3-public.s3.amazonaws.com/ddhfiles/506801/povcalnet_comparability.csv"
# Load data ---------------------------------------------------------------
metadata <- read_csv(metadata_path)
cov_lkup <- c(3, 2, 1, 4)
names(cov_lkup) <- c("N", "U", "R", "A")
dat_lkup <- c(2,1)
names(dat_lkup) <- c("income","consumption")
pcn <- povcalnet()
pcn$coveragetype <- cov_lkup[pcn$coveragetype]
pcn$datatype <- dat_lkup[pcn$datatype]
df <- pcn %>%
inner_join(metadata, by = c("countrycode", "year", "coveragetype", "datatype"))
#Now loop from 2016 and 2019, keeping just data inside last 5 years.
for (reference_year in 2016:2019) {
temp1<-df %>%
filter(coveragetype %in% c(3,4)) %>% #keep just nationally representative samples
mutate(frequency=((reference_year-as.numeric(year))<=5) & (reference_year>=as.numeric(year))) %>%
filter(frequency==TRUE) %>%
group_by(countrycode, comparability) %>% #for each country and comparability type, get number of comparable estimates
summarise(SPI.QUAL.D1.5.POV=n()) %>%
ungroup() %>%
group_by(countrycode) %>% #now get a total by country with the max number of comparable estimates
summarise(SPI.QUAL.D1.5.POV=max(SPI.QUAL.D1.5.POV, na.rm=T)) %>%
mutate(SPI.QUAL.D1.5.POV=if_else(SPI.QUAL.D1.5.POV>=2,0.5,0)) #only give point if there is at least two observations that are comparable
temp <-df %>%
filter(coveragetype %in% c(3,4)) %>% #keep just nationally representative samples
mutate(frequency=((reference_year-as.numeric(year))<=5) & (reference_year>=as.numeric(year))) %>%
mutate(SPI.FREQ.D1.5.POV=if_else(frequency==TRUE,1,0)) %>% #create 0,1 variable for whether data point exists for country
group_by(countryname,countrycode) %>%
summarise(SPI.FREQ.D1.5.POV=sum(SPI.FREQ.D1.5.POV, na.rm=T)) %>%
mutate(SPI.FREQ.D1.5.POV=case_when(
SPI.FREQ.D1.5.POV>=3 ~ 0.5,
SPI.FREQ.D1.5.POV==2 ~ 0.3,
SPI.FREQ.D1.5.POV==1 ~ 0.15,
SPI.FREQ.D1.5.POV==0 ~ 0,
TRUE ~ 0
)) %>%
left_join(temp1) %>% #now bring in quality dimension
mutate(SPI.QUAL.D1.5.POV=if_else(!is.na(SPI.QUAL.D1.5.POV),SPI.QUAL.D1.5.POV,0)) %>% #recode some missing values as zero, who had no observations in 5 year window
mutate(SPI.D1.5.POV=SPI.FREQ.D1.5.POV+SPI.QUAL.D1.5.POV) %>%
mutate(date=reference_year,
country=countryname,
iso3c=countrycode) %>%
select( country,iso3c, date, starts_with('SPI.D1.5.POV'))
assign(paste("D3.1.AKI",reference_year,sep="_"), temp)
}
D3.1.AKI <- bind_rows(D3.1.AKI_2016, D3.1.AKI_2017, D3.1.AKI_2018, D3.1.AKI_2019)
We examine the Food Insecurity Experience Scale from the FAO (http://www.fao.org/faostat/en/#data/FS). Data for food insecurity was pulled on May 1, 2020.
Scoring 1 Point. 3 or more values available within past 5 years 0.6 Points. 2 values available within past 5 years;
0.3 Points. 1 values available within past 5 years;
0 Points. None within past 5 years
#Read in data from UN Inter-agency Group for Child Mortality Estimation
D3.2.AKI.FIES <- read_csv(file=paste(raw_dir, '3.1_DPSS/D3.2.FIES.csv', sep="/")) %>%
rename(year_code="Year Code",
iso3c="ISO3 Code") %>%
mutate(date=as.numeric(str_extract(year_code, "^.{4}")),
SPI.D1.5.FIES=Value)
#Now loop from 2016 and 2019, keeping just data inside last 5 years.
for (reference_year in 2016:2019) {
temp <-D3.2.AKI.FIES %>%
filter(((reference_year-as.numeric(date))<=5) & (reference_year>=as.numeric(date))) %>% #filter out years outside reference window of 5 years
mutate(SPI.D1.5.FIES=if_else(is.na(SPI.D1.5.FIES),0,1)) %>% #create 0,1 variable for whether data point exists for country
group_by(iso3c) %>%
summarise_all((~(if(is.numeric(.)) sum(., na.rm = TRUE) else first(.)))) %>% #group by country to create one observation per country containing whether or not data point existed
mutate(SPI.D1.5.FIES=case_when(
SPI.D1.5.FIES>=3 ~ 1,
SPI.D1.5.FIES==2 ~ 0.6,
SPI.D1.5.FIES==1 ~ 0.3,
SPI.D1.5.FIES==0 ~ 0,
TRUE ~ 0
)) %>% # 1 point for at least 3 values, 0.6 for 2 values, 0.3 for 1 values, 0 otherwise
mutate(date=reference_year) %>%
select( iso3c, date, starts_with('SPI.'))
assign(paste("D3.2.AKI",reference_year,sep="_"), temp)
}
D3.2.AKI <- bind_rows(D3.2.AKI_2016, D3.2.AKI_2017, D3.2.AKI_2018, D3.2.AKI_2019)
The data in WDI is modeled, based on HHS and Vital Registration. We use the following as a source for raw data produced by national statistical offices that is used in this modeling.
https://childmortality.org/data
Data was pulled on April 13, 2020.
Countries are ranked by source quality (admin data > survey > no data) and frequency. Score on this indicator has a max of 1 point with 0.5 points for source quality and 0.5 points for frequency. Detailed scoring for source quality and frequency components are below.
Quality (0.5 points total): 0.5 Point. Vital Registration data available within past 5 years
0.25 Points. Survey Data availabe, but no Vital Registration data within past 5 years;
0 Points. None within past 5 years
Frequency (0.5 points total): 0.5 Point. 3 or more values available within past 5 years
0.3 Points. 2 values available within past 5 years;
0.15 Points. 1 values available within past 5 years;
0 Points. None within past 5 years
#Read in data from UN Inter-agency Group for Child Mortality Estimation
D3.3.AKI.MORT <- read_csv(file=paste(raw_dir, '3.1_DPSS/D3.3_child_mort_RAW_v01.csv', sep="/")) %>%
filter(INDICATOR=='Under-five mortality rate' & SEX=='Total') %>% #keep just observations for under 5 child mortality and for both sexes
filter(OBS_STATUS=='Included in IGME') %>% # Also keep only surveys that met IGME criteria for inclusion as a nationally representative statistic
mutate(date=as.numeric(str_extract(TIME_PERIOD, "^.{4}")),
country=REF_AREA,
D3.CHLD.MORT=OBS_VALUE)
#Now loop from 2016 and 2019, keeping just data inside last 5 years.
for (reference_year in 2016:2019) {
temp <-D3.3.AKI.MORT %>%
mutate(frequency=((reference_year-as.numeric(date))<=5) & (reference_year>=as.numeric(date))) %>%
mutate(SPI.FREQ.D1.5.CHLD.MORT=if_else(frequency==TRUE,1,0)) %>% #create 0,1 variable for whether data point exists for country
mutate(SPI.QUAL.D1.5.CHLD.MORT=case_when(
reference_year-as.numeric(date)>=5 ~ 0, #outside 5 year window
SERIES_METHOD=='Vital Registration' ~ 0.5,
SERIES_METHOD!='Vital Registration' ~ 0.25,
TRUE ~ as.numeric(NA)
)) %>%
group_by(country) %>%
summarise(SPI.FREQ.D1.5.CHLD.MORT=sum(SPI.FREQ.D1.5.CHLD.MORT, na.rm=T),
SPI.QUAL.D1.5.CHLD.MORT=max(SPI.QUAL.D1.5.CHLD.MORT, na.rm=T)) %>%
mutate(SPI.FREQ.D1.5.CHLD.MORT=case_when(
SPI.FREQ.D1.5.CHLD.MORT>=3 ~ 0.5,
SPI.FREQ.D1.5.CHLD.MORT==2 ~ 0.3,
SPI.FREQ.D1.5.CHLD.MORT==1 ~ 0.15,
SPI.FREQ.D1.5.CHLD.MORT==0 ~ 0,
TRUE ~ 0
)) %>%
mutate(SPI.D1.5.CHLD.MORT=SPI.FREQ.D1.5.CHLD.MORT+SPI.QUAL.D1.5.CHLD.MORT) %>%
mutate(date=reference_year) %>%
select( country, date, starts_with('SPI.'))
assign(paste("D3.3.AKI",reference_year,sep="_"), temp)
}
D3.3.AKI <- bind_rows(D3.3.AKI_2016, D3.3.AKI_2017, D3.3.AKI_2018, D3.3.AKI_2019)
The data is sourced from the the Inter-Agency Group (200 countries) to improve time and country coverages. Tne national estimates with the source information are available at https://www.who.int/reproductivehealth/publications/maternal-mortality-2000-2017/en/. The data was accessed on April 3, 2020. This data was used for modelling maternal mortality, which are then fed into the modelled estimates on Maternal Mortality (SH.STA.MMRT) in the WDI.
For our purposes, we will use the raw data from the country surveys as our data for the availability of key indicators, since the modelled estimates are produced out of sample, and thus not reflective of a country’s national statistical system.
Countries are ranked by source quality (admin data > survey > no data) and frequency. Score on this indicator has a max of 1 point with 0.5 points for source quality and 0.5 points for frequency. Detailed scoring for source quality and frequency components are below.
Quality (0.5 points total): 0.5 Point. Vital Registration data available within past 5 years
0.25 Points. Survey Data or Census Data availabe, but no Vital Registration data within past 5 years;
0 Points. None within past 5 years
Frequency (0.5 points total): 0.5 Point. 3 or more values available within past 5 years
0.3 Points. 2 values available within past 5 years;
0.15 Points. 1 values available within past 5 years;
0 Points. None within past 5 years
#Read in data from UN Inter-agency Group
D3.5.AKI.MMR <- read_csv(file=paste(raw_dir, '3.1_DPSS/D3.5.BMat2019_datainputs.csv', sep="/")) %>%
filter(modelinclude=='TRUE') %>% #keep just observations that the UN inter-agency group deemed credible for use in modelling
mutate(date=end,
iso3c=iso,
D3.MMRT=pm_obs)
#Now loop from 2016 and 2019, keeping just data inside last 5 years.
for (reference_year in 2016:2019) {
temp <-D3.5.AKI.MMR %>%
mutate(frequency=((reference_year-as.numeric(date))<=5) & (reference_year>=as.numeric(date))) %>%
mutate(SPI.FREQ.D1.5.MMRT=if_else(frequency==TRUE,1,0)) %>% #create 0,1 variable for whether data point exists for country
mutate(SPI.QUAL.D1.5.MMRT=case_when(
reference_year-as.numeric(date)>=5 ~ 0, #outside 5 year window
type=='vr' ~ 0.5,
type!='vr' ~ 0.25,
TRUE ~ as.numeric(NA)
)) %>%
group_by(iso3c) %>%
summarise(SPI.FREQ.D1.5.MMRT=sum(SPI.FREQ.D1.5.MMRT, na.rm=T),
SPI.QUAL.D1.5.MMRT=max(SPI.QUAL.D1.5.MMRT, na.rm=T)) %>%
mutate(SPI.FREQ.D1.5.MMRT=case_when(
SPI.FREQ.D1.5.MMRT>=3 ~ 0.5,
SPI.FREQ.D1.5.MMRT==2 ~ 0.3,
SPI.FREQ.D1.5.MMRT==1 ~ 0.15,
SPI.FREQ.D1.5.MMRT==0 ~ 0,
TRUE ~ 0
)) %>%
mutate(SPI.D1.5.MMRT=SPI.FREQ.D1.5.MMRT+SPI.QUAL.D1.5.MMRT) %>%
mutate(date=reference_year) %>%
select( iso3c, date, starts_with('SPI.'))
assign(paste("D3.5.AKI",reference_year,sep="_"), temp)
}
D3.5.AKI <- bind_rows(D3.5.AKI_2016, D3.5.AKI_2017, D3.5.AKI_2018, D3.5.AKI_2019)
Access to Electricity (% of population) has 99% coverage in the WDI database making it less useful for distinguishing between countries. In order to improve the usefulness of this indicator, we are making use of data compiled by World Bank colleagues on surveys containing electricity items. According to the access to electricity metadata, around 42 countries have no data and an imputed value is assigned based on the regional average. By going to survey source data, we address this issue of imputation, which gives a misleading picture of availability for our purposes. Below is the metadata from the WDI for methods for the access to electricity indicator.
Data for access to electricity are collected among different sources: mostly data from nationally representative household surveys (including national censuses) were used. Survey sources include Demographic and Health Surveys (DHS) and Living Standards Measurement Surveys (LSMS), Multi-Indicator Cluster Surveys (MICS), the World Health Survey (WHS), other nationally developed and implemented surveys, and various government agencies (for example, ministries of energy and utilities). Given the low frequency and the regional distribution of some surveys, a number of countries have gaps in available data. To develop the historical evolution and starting point of electrification rates, a simple modeling approach was adopted to fill in the missing data points - around 1990, around 2000, and around 2010. Therefore, a country can have a continuum of zero to three data points. There are 42 countries with zero data point and the weighted regional average was used as an estimate for electrification in each of the data periods. 170 countries have between one and three data points and missing data are estimated by using a model with region, country, and time variables. The model keeps the original observation if data is available for any of the time periods. This modeling approach allowed the estimation of electrification rates for 212 countries over these three time periods (Indicated as “Estimate”). Notation “Assumption” refers to the assumption of universal access in countries classified as developed by the United Nations. Data begins from the year in which the first survey data is available for each country.
Scoring 1 Point. 3 or more values available within past 5 years 0.6 Points. 2 values available within past 5 years;
0.3 Points. 1 values available within past 5 years;
0 Points. None within past 5 years
#read in file from rawdata folder
D3.7.ELEC <- read_csv(file = paste(raw_dir, "3.2_DPES/D3.7.ELEC.csv", sep="/" )) %>%
rename(country=Country,
iso3c="Country code",
date=Time)
#Now loop from 2016 and 2019, keeping just data inside last 5 years.
for (reference_year in 2016:2019) {
temp <-D3.7.ELEC %>%
mutate(frequency=((reference_year-as.numeric(date))<=5) & (reference_year>=as.numeric(date))) %>%
mutate(SPI.D1.5.ELEC=if_else(frequency==TRUE,1,0)) %>% #create 0,1 variable for whether data point exists for country
group_by(country) %>%
summarise(SPI.D1.5.ELEC=sum(SPI.D1.5.ELEC, na.rm=T)) %>%
mutate(SPI.D1.5.ELEC=case_when(
SPI.D1.5.ELEC>=3 ~ 1,
SPI.D1.5.ELEC==2 ~ 0.6,
SPI.D1.5.ELEC==1 ~ 0.3,
SPI.D1.5.ELEC==0 ~ 0,
TRUE ~ 0
)) %>%
mutate(date=reference_year) %>%
select( country, date, starts_with('SPI.'))
assign(paste("D3.7.AKI",reference_year,sep="_"), temp)
}
D3.7.AKI <- bind_rows(D3.7.AKI_2016, D3.7.AKI_2017, D3.7.AKI_2018, D3.7.AKI_2019)
Quarterly GDP numbers were pulled from the IMF website.
After GDP by expenditure, quarterly GDP is probably the most important development to be made in a system of National Accounts, before the development of full sectoral accounts. IMF IFS has quarterly GDP from various sources, including governments and international agencies at https://data.imf.org/?sk=4C514D48-B6BA-49ED-8AB9-52B0C1A0179B
Scoring 1 Point. 3 or more values available within past 5 years 0.6 Points. 2 values available within past 5 years;
0.3 Points. 1 values available within past 5 years;
0 Points. None within past 5 years
#read in file from rawdata folder
D3.14.QUART.GDP <- read_csv(file = paste(raw_dir, "3.2_DPES/D3.14.QUART.GDP.csv", sep="/" ))
#clean data and produce indicator for each year
#Now loop from 2016 and 2019, keeping just data inside last 5 years.
for (reference_year in 2016:2019) {
temp <-D3.14.QUART.GDP %>%
mutate(date=as.numeric(str_sub(TIME_PERIOD,1,4))) %>%
mutate(iso2c=REF_AREA) %>%
mutate(frequency=((reference_year-as.numeric(date))<=5) & (reference_year>=as.numeric(date))) %>%
mutate(SPI.D1.5.QUART.GDP=if_else(frequency==TRUE,1,0)) %>% #create 0,1 variable for whether data point exists for country
group_by(iso2c, date) %>%
summarise(SPI.D1.5.QUART.GDP=max(SPI.D1.5.QUART.GDP, na.rm=T)) %>%
group_by(iso2c) %>%
summarise(SPI.D1.5.QUART.GDP=sum(SPI.D1.5.QUART.GDP, na.rm=T)) %>%
mutate(SPI.D1.5.QUART.GDP=case_when(
SPI.D1.5.QUART.GDP>=3 ~ 1,
SPI.D1.5.QUART.GDP==2 ~ 0.6,
SPI.D1.5.QUART.GDP==1 ~ 0.3,
SPI.D1.5.QUART.GDP==0 ~ 0,
TRUE ~ 0
)) %>%
mutate(date=reference_year) %>%
select( iso2c, date, starts_with('SPI.D1.5.QUART.GDP'))
assign(paste("D3.14.AKI",reference_year,sep="_"), temp)
}
D3.17.AKI <- bind_rows(D3.14.AKI_2016, D3.14.AKI_2017, D3.14.AKI_2018, D3.14.AKI_2019)
For this indicator, Debt service (PPG and IMF only, % of exports of goods, services and primary income), we will pull data from the WDI but modify the scoring using the WDI metadata on whether the external debt data is actual, estimated, or preliminary. The status “as reported (actual)” indicates that the country was fully current in its reporting under the DRS and that World Bank staff are satisfied that the reported data give an adequate and fair representation of the country’s total public debt. “Preliminary” data are based on reported or collected information, but because of incompleteness or other reasons, an element of staff estimation is included. “Estimated” data indicate that countries are not current in their reporting and that a significant element of staff estimation has been necessary for producing the data tables.
Scoring is as follows:
Frequency:
0.5 Point. 3 or more values available within past 5 years
0.3 Points. 2 values available within past 5 years;
0.15 Points. 1 values available within past 5 years;
0 Points. None within past 5 years
Quality:
0.5 Points. Actual value 0.3 Points. Preliminary value 0.15 Points. Estimated value 0 Points. No value
#reshape metaadata file
metadata_3.15 <- WDI_metadata %>%
transmute(iso3c=if_else(is.na(Country.Code), Code, Country.Code),
date=date,
External_debt_Reporting=External.debt.Reporting.status) %>%
mutate(SPI.QUAL.D1.5.DT.TDS.DPPF.XP.ZS= case_when(
External_debt_Reporting=='Actual' ~ 0.5,
External_debt_Reporting=='Preliminary' ~ 0.3,
External_debt_Reporting=='Estimate' ~ 0.15,
TRUE ~ 0
))
#pull data from wdi and merge with metadata
for (reference_year in 2016:2019) {
temp <-wbstats::wb(country="countries_only",
indicator='DT.TDS.DPPF.XP.ZS',
startdate=reference_year-5,
enddate=reference_year,
return_wide = T,
removeNA=FALSE) %>%
filter(((reference_year-as.numeric(date))<=5) & (reference_year>=as.numeric(date))) %>% #filter out years outside reference window of 3 years
mutate_at(.vars=c('DT.TDS.DPPF.XP.ZS'), ~if_else(is.na(.),0,1)) %>% #create 0,1 variable for whether data point exists for country
group_by(iso3c, country) %>%
summarise_all((~(if(is.numeric(.)) sum(., na.rm = TRUE) else first(.)))) %>% #group by country to create one observation per country containing whether or not data point existed
mutate(SPI.FREQ.D1.5.DT.TDS.DPPF.XP.ZS = case_when(
DT.TDS.DPPF.XP.ZS>=3 ~ 0.5,
DT.TDS.DPPF.XP.ZS==2 ~ 0.3,
DT.TDS.DPPF.XP.ZS==1 ~ 0.15,
DT.TDS.DPPF.XP.ZS==0 ~ 0,
TRUE ~ 0
)) %>% # 0.5 point for at least 3 values, 0.3 for 2 values, 0.15 for 1 values, 0 otherwise
mutate(date=reference_year) %>%
left_join(metadata_3.15) %>% #attach country metadata
mutate(SPI.QUAL.D1.5.DT.TDS.DPPF.XP.ZS= case_when(
External_debt_Reporting=='Actual' ~ 0.5,
External_debt_Reporting=='Preliminary' ~ 0.3,
External_debt_Reporting=='Estimate' ~ 0.15,
TRUE ~ 0
)) %>% # 0.5 point for actual, 0.3 for preliminary, 0.15 for estimate
mutate(
SPI.D1.5.DT.TDS.DPPF.XP.ZS=(SPI.QUAL.D1.5.DT.TDS.DPPF.XP.ZS + SPI.FREQ.D1.5.DT.TDS.DPPF.XP.ZS)) %>%
ungroup() %>%
select(iso3c, country, date, contains('D3.DT.TDS.DPPF.XP.ZS'))
assign(paste("D3.15.AKI",reference_year,sep="_"), temp)
}
D3.15.AKI <- bind_rows(D3.15.AKI_2016, D3.15.AKI_2017, D3.15.AKI_2018, D3.15.AKI_2019)
#now combine AKI databases and write to csv
D3.AKI <- bind_rows(D3.AKI_2016, D3.AKI_2017, D3.AKI_2018, D3.AKI_2019) %>%
left_join(D3.1.AKI) %>%
left_join(D3.2.AKI) %>%
left_join(D3.3.AKI) %>%
left_join(D3.5.AKI) %>%
left_join(D3.7.AKI) %>%
left_join(D3.15.AKI) %>%
left_join(D3.17.AKI) %>%
mutate_at(vars(starts_with('SPI.D1.5.')), ~if_else(is.na(.),0,as.numeric(.)))
spi_df <- spi_df %>%
left_join(D3.AKI)
aki <- c(
SPI.D1.5.POV='Poverty headcount ratio at $1.90 a day (2011 PPP) (% of population)',
SPI.D1.5.FIES='Food Insecurity Experience Scale',
SPI.D1.5.CHLD.MORT='Mortality rate, under-5 (per 1,000 live births)',
SPI.D1.5.SE.LPV.PRIM.BMP='Pupils below minimum reading proficiency at end of primary (%). Low GAML threshold',
SPI.D1.5.MMRT='Maternal Mortality',
SPI.D1.5.SH.H2O.SMDW.ZS='People using safely managed drinking water services (% of population)',
SPI.D1.5.ELEC='Access to electricity (% of population)',
SPI.D1.5.SL.UEM.TOTL.NE.ZS='Unemployment, total (% of total labor force) (national estimate)' ,
SPI.D1.5.NV.IND.MANF.ZS='Manufacturing, value added (% of GDP)',
SPI.D1.5.SI.SPR.PC40.ZG='Annualized average growth rate in per capita real survey mean consumption or income, bottom 40% of population (%)',
SPI.D1.5.ER.H2O.FWST.ZS='Level of water stress: freshwater withdrawal as a proportion of available freshwater resources',
SPI.D1.5.EG.FEC.RNEW.ZS='Renewable energy consumption (% of total final energy consumption)',
SPI.D1.5.NE.CON.PRVT.CN='Households and NPISHs Final consumption expenditure (current LCU)',
SPI.D1.5.QUART.GDP='Quarterly GDP',
SPI.D1.5.DT.TDS.DPPF.XP.ZS='Debt service (PPG and IMF only, % of exports of goods, services and primary income)'
)
for (ind in names(aki)) {
ind <- ind
names(aki)
print(ind)
temp_map <- D3.AKI %>%
filter(.data[[ind]] > 0)
spi_mapper('temp_map', ind, aki[[ind]])
}
## [1] "SPI.D1.5.POV"
## [1] "SPI.D1.5.FIES"
## [1] "SPI.D1.5.CHLD.MORT"
## [1] "SPI.D1.5.SE.LPV.PRIM.BMP"
## [1] "SPI.D1.5.MMRT"
## [1] "SPI.D1.5.SH.H2O.SMDW.ZS"
## [1] "SPI.D1.5.ELEC"
## [1] "SPI.D1.5.SL.UEM.TOTL.NE.ZS"
## [1] "SPI.D1.5.NV.IND.MANF.ZS"
## [1] "SPI.D1.5.SI.SPR.PC40.ZG"
## [1] "SPI.D1.5.ER.H2O.FWST.ZS"
## [1] "SPI.D1.5.EG.FEC.RNEW.ZS"
## [1] "SPI.D1.5.NE.CON.PRVT.CN"
## [1] "SPI.D1.5.QUART.GDP"
## [1] "SPI.D1.5.DT.TDS.DPPF.XP.ZS"
Cleaning for Data Services Indicators. Data Services (4 Indicators):
- 2.1_DSDR - Indicator 2.1: Data releases
- 2.2_DSOA - Indicator 2.2: Online access
- 2.3_DSAS - Indicator 2.3: Advisory/ Analytical Services
- 2.4_DSDS - Indicator 2.4: Data services
Data Dissemination Standard (SDDS) and electronic General Data Dissemination Standard (e-GDDS) were established by the International Monetary Fund (IMF) for member countries that have or that might seek access to international capital markets, to guide them in providing their economic and financial data to the public. Although subscription is voluntary, the subscribing member needs to be committed to observing the standard and provide information about its data and data dissemination practices (metadata). The metadata are posted on the IMF’s SDDS and e-GDDS websites.
1 Point. Subscribing to IMF SDDS+ or SDDS standards 0.5 Points. Subscribing to IMF e-GDDS standards 0 Points. Otherwise
Request_metadata <- GET(url = "http://api.worldbank.org/v2/country/all/indicator/5.21.01.01.sdds?format=json&date=2004:2015&per_page=5000")
Response_metadata <- content(Request_metadata, as = "text", encoding = "UTF-8")
# Parse the JSON content and convert it to a data frame.
D2.1_DSDR_sci <- jsonlite::fromJSON(Response_metadata, flatten = TRUE) %>%
data.frame() %>%
transmute(
iso3c=countryiso3code,
country=country.value,
date=as.numeric(date),
SDDS=value
) %>%
left_join(spi_df_empty) %>% #add on country metadata
filter(!is.na(income)) %>%
select(iso3c, country, date, SDDS )
#read in csv file.
D2.1_DSDR <- read_csv(file = paste(raw_dir, '/2.1_DSDR/', "D2.1.GDDS.csv", sep="" )) %>%
mutate(SPI.D2.1.GDDS=case_when(
IDDS=="SDDS Plus" | IDDS=="SDDS"~ 1,
IDDS=="e-GDDS"~ 0.5,
TRUE ~ 0 ),
SDDS=case_when(
IDDS=="SDDS"~ 1,
TRUE ~ 0 )) %>%
bind_rows(D2.1_DSDR_sci) %>%
rename(RAW.D2.1.IDDS=IDDS) %>%
select(iso3c, country, date, RAW.D2.1.IDDS, SPI.D2.1.GDDS ) %>%
arrange(date, country)
D2.1_DSDR_map <- D2.1_DSDR %>%
filter(RAW.D2.1.IDDS>0)
spi_mapper('D2.1_DSDR_map', 'SPI.D2.1.GDDS', 'SDDS Data Standards in Place' )
#add to spi databases
spi_df <- spi_df %>%
left_join(D2.1_DSDR)
This indicator measures the richness and openness of online access.
Our source for this indicator is Open Data Watch. From Open Data watch:
The Open Data Inventory (ODIN) assesses the coverage and openness of official statistics to help identify gaps, promote open data policies, improve access, and encourage dialogue between national statistical offices (NSOs) and data users. ODIN 2018/19 includes 178 countries, including most all OECD countries. Two-year comparisons are for all countries with two years of data between 2015-2017. Scores can be compared across topics and countries.
We use the Openness score from ODIN for this measure. The score ranges from 0-100. It contains scores along five dimensions:
- Machine Readability
- Non-Proprietary format
- Download Options
- Metadata Available
- Terms of Use
A description for each of these five dimensions is below:
Openness element 1 measures whether data are available in a machine readable format such as XLS, XLSX, CSV, and JSON. Machine-readable file formats allow users to easily process data using a computer. When data are made available in formats that are not machine readable, users cannot easily access and modify the data, which severely restricts the scope of the data’s use. In many cases PDF versions of datasets within reports can be useful to users, as the text in conjunction with the tables gives context and explanation to the figures which helps less technical users understand the data. Because of this, ODIN assessments do not penalize countries for making datasets available in PDF or other non-machine readable formats, unless these formats are the only option for exporting data. Scores are not penalized for having identical datasets in both machine readable and non-readable formats. Compression formats do not affect machine readability scores, only non-proprietary scores (see next page). Scores are given by data category, not indicator.
For the elements of data openness, scoring is calculated independent of the data coverage. If data files are compressed in RAR format (which is proprietary), data for that indicator should be considered proprietary even if the enclosing files are in a non-proprietary format. Files compressed in ZIP format are not affected.
Openness element 3 measures whether data are available with three different download options: bulk download, API, and user-select options. A bulk download is defined at the indicator level as: The ability to download all data recorded in ODIN for a particular indicator (all years, disaggregations, and subnational data) in one file, or multiple files that can be downloaded simultaneously. Bulk downloads are a key component of the Open Definition, which requires data to be “provided as a whole . . . and downloadable via the internet.” User-selectable download options are defined as: Users must be able to select an indicator and at least one other dimension to create a download or table. These dimensions could include time periods, geographic disaggregations, or other recommended disaggregations. An option to choose the file export format is not enough. API stands for Application Programming Interface. Ideally, APIs should be clearly displayed on the website. ODIN assumes APIs are available for the NSOs entire data collection used in ODIN, unless clearly stated. ODIN assessors do not register for use or test API functionality. For more information on APIs, see this guide. Scores are given by data category, not indicator.
Openness element 4 measures whether metadata are made available. Scores are given by data category, not indicator. Metadata are defined at the indicator level as information about how the data are defined/calculated and collected. ODIN classifies metadata into three categories: (1) Not Available, (2) Incomplete, and (3) Complete. The following must be available to classify metadata as complete: • Definition of the indicator, or definition of key terms used in the indicator description (as applicable), or how the indicator was calculated. • Publication (date of upload), compilation date (date on front of report is not sufficient), or date dataset was last updated. • Name of data source (what agency collected the data). If the metadata only have one or two of the above elements, they are scored as incomplete
Openness element 5 measures whether data are available with an open terms of use. Generally, terms of use (TOU) will apply to an entire website or data portal (unless otherwise specified). In these cases, all data found on the same website and/or portal will receive the same score. If a portal is located on the same domain as the NSO website, the terms of use on the NSO site will apply. If the data are located on a portal or website on a different domain, another terms of use will need to be present. For a policy/ license to be accepted as a terms of use, it must clearly refer to the data found on the website. Terms of use that refer to nondata content (such as pictures, logos, etc.) of the website are not considered. A copyright symbol at the bottom of the page is not sufficient. A sentence indicating a recommended citation format is not sufficient. Terms of use are classified the following ways: (1) Not Available, (2) Restrictive, (3) Semi-Restrictive, and (4) Open. If the TOU contains one or more restrictive clauses, it receives 0 points and is classified as “restrictive.” Restrictive clauses include:
For more details, consult the ODIN technical documentation: https://docs.google.com/document/d/1ubPL1l_3im9bjlCVZ6W2ICAy6UAiXl1hGeA1aXImkxI/edit#
#read in ODIN data
for (i in 2015:2018) {
temp <- read_csv(paste(raw_dir, '/2.2_DSOA/','ODIN_',i,'.csv', sep="")) %>%
as_tibble(.name_repair = 'universal') %>%
mutate(date=i) %>%
filter(Data.categories=='All Categories')
assign(paste('openness_df',i,sep="_"), temp)
}
#bind different years together
openness_df <- bind_rows(openness_df_2015, openness_df_2016, openness_df_2017, openness_df_2018)
openness_df <- openness_df %>%
select(Country.Code, date, Machine.readable, Non.proprietary, Download.options, Metadata.available, Terms.of.use, Openness.subscore) %>%
rename(iso3c=Country.Code) %>%
group_by( iso3c) %>%
right_join(country_metadata) %>%
mutate(
across(c('Machine.readable', 'Non.proprietary', 'Download.options', 'Metadata.available', 'Terms.of.use', 'Openness.subscore'),~if_else(is.na(.), 0, .) )
)
#do some quick renaming and formatting
openness_df_temp1 <- openness_df %>%
rename_with(~paste('RAW.D2.2',., sep="."), .cols=c('Machine.readable', 'Non.proprietary', 'Download.options', 'Metadata.available', 'Terms.of.use', 'Openness.subscore') )
openness_df_temp2 <- openness_df %>%
mutate(across(c('Machine.readable', 'Non.proprietary', 'Download.options', 'Metadata.available', 'Terms.of.use', 'Openness.subscore'), ~./100)) %>%
rename_with(~paste('SPI.D2.2',., sep="."), .cols=c('Machine.readable', 'Non.proprietary', 'Download.options', 'Metadata.available', 'Terms.of.use', 'Openness.subscore') )
openness_df <- openness_df_temp1 %>%
left_join(openness_df_temp2)
#add to spi dataframe
spi_df <- spi_df %>%
left_join(openness_df)
No established source exists for this indicator. This is experimental.
NSO has a listing of surveys and microdata sets that can provide the necessary data and reference for follow-up. Upon well-defined request and procedure per the national law and practice, users and practitioners can obtain the data collected from the households and businesses when needed.
NADA is an open source microdata cataloging system, compliant with the Data Documentation Initiative (DDI) and Dublin Core’s RDF metadata standards. It serves as a portal for researchers to browse, search, compare, apply for access, and download relevant census or survey datasets, questionnaires, reports and other information.
1 Point. Yes 0 Points. No
#read in csv file.
D2.4_NADA <- read_csv(file = paste(raw_dir, '/2.4_DSDS/', "D2.4.NADA.csv", sep="/" )) %>%
mutate(SPI.D2.4.NADA=case_when(
NADA==1 ~ 1,
NADA==0 ~ 0,
TRUE ~ 0
)) %>%
rename(RAW.D2.4.NADA=NADA,
RAW.D2.4.NADA_text=NADA_text) %>%
select(iso3c, country, date, RAW.D2.4.NADA,RAW.D2.4.NADA_text, SPI.D2.4.NADA ) %>%
arrange(date, country)
D2.4_NADA_map <- D2.4_NADA %>%
filter(SPI.D2.4.NADA>0)
spi_mapper('D2.4_NADA_map', 'RAW.D2.4.NADA', 'NADA in Place' )
#add to spi databases
spi_df <- spi_df %>%
left_join(D2.4_NADA)
Cleaning for Data Products Indicators. Data Products (17 Indicators):
Download the latest SDG indicator data from UN Stats (https://unstats.un.org/sdgs/indicators/en/#) using their API
Transform the data so that for each indicator we can create a score documenting whether a value exists for the country in a year, whether the value is based on country data, country data adjusted, estimated, or modelled data according the UN Stats metadata. This will only include tier 1 indicators.
Combine the resulting data into a single set of indicators for use in the Statistical Performance Indicators dashboard and index by calculating the average across SDGs. This results in 17 total indicators. One for each SDG, which is the share of indicators with a value meeting quality criteria defined below.
Below is a paraphrased description from the UN stats webpage (https://unstats.un.org/sdgs/indicators/indicators-list/):
The global indicator framework for Sustainable Development Goals was developed by the Inter-Agency and Expert Group on SDG Indicators (IAEG-SDGs) and agreed upon at the 48th session of the United Nations Statistical Commission held in March 2017.
The global indicator framework includes 231 unique indicators. Please note that the total number of indicators listed in the global indicator framework of SDG indicators is 247. However, twelve indicators repeat under two or three different targets.
For each value of the indicator, the responsible international agency has been requested to indicate whether the national data were adjusted, estimated, modelled or are the result of global monitoring. The “nature” of the data in the SDG database is determined as follows:
Country data (C): Produced and disseminated by the country (including data adjusted by the country to meet international standards);
Country data adjusted (CA): Produced and provided by the country, but adjusted by the international agency for international comparability to comply with internationally agreed standards, definitions and classifications;
Estimated (E): Estimated based on national data, such as surveys or administrative records, or other sources but on the same variable being estimated, produced by the international agency when country data for some year(s) is not available, when multiple sources exist, or when there are data quality issues;
Modelled (M): Modelled by the agency on the basis of other covariates when there is a complete lack of data on the variable being estimated;
Global monitoring data (G): Produced on a regular basis by the designated agency for global monitoring, based on country data. There is no corresponding figure at the country level.
For each indicator, we will produce a value for each country with the following coding scheme:
We give countries no credit for modeled data, because the country did not produce indicators in a form that was directly usable for reporting on an SDG indicator.
When we average over all indicators in a goal to get a score, we compute a 5 year moving average to avoid year to year variability in reporting for SDGs. The overall score for an SDG is then the 5 year average of the percentage of indicator values based on country, country data adjusted, or estimated or Global Monitoring data that were available for the SDG.
Because of the large data files and runtime of the program to calculate these indicators, the code to produce these indicators is in a separate file. https://github.com/stacybri/UN_Stats_SDG_Indicators_SPI/blob/master/02_programs/un_stats_cleaning.Rmd
un_sdg_df <- read_csv(paste(raw_dir,'/3_DP/SPI_D3_UNSD_data_5yr.csv',sep="")) %>%
rename(SPI.D3.1.POV = SPI.D3.1,
SPI.D3.2.HNGR = SPI.D3.2,
SPI.D3.3.HLTH = SPI.D3.3,
SPI.D3.4.EDUC = SPI.D3.4,
SPI.D3.5.GEND = SPI.D3.5,
SPI.D3.6.WTRS = SPI.D3.6,
SPI.D3.7.ENRG = SPI.D3.7,
SPI.D3.8.WORK = SPI.D3.8,
SPI.D3.9.INDY = SPI.D3.9,
SPI.D3.10.NEQL = SPI.D3.10,
SPI.D3.11.CITY = SPI.D3.11,
SPI.D3.12.CNSP = SPI.D3.12,
SPI.D3.13.CLMT = SPI.D3.13,
SPI.D3.14.LFWT = SPI.D3.14,
SPI.D3.15.LAND = SPI.D3.15,
SPI.D3.16.INST = SPI.D3.16,
SPI.D3.17.PTNS = SPI.D3.17
) %>%
select(country, date, starts_with("SPI"))
#add to spi databases
spi_df <- spi_df %>%
left_join(un_sdg_df)
#map the values
un_sdg_map <- un_sdg_df %>%
select(country, date, starts_with("SPI"))
spi_mapper('un_sdg_map','SPI.D3.1.POV','Goal 1: No Poverty')
spi_mapper('un_sdg_map','SPI.D3.2.HNGR','Goal 2: No Hunger')
spi_mapper('un_sdg_map','SPI.D3.3.HLTH','Goal 3: Good Health and Well-being')
spi_mapper('un_sdg_map','SPI.D3.4.EDUC','Goal 4: Quality Education ')
spi_mapper('un_sdg_map','SPI.D3.5.GEND','Goal 5: Gender Equality ')
spi_mapper('un_sdg_map','SPI.D3.6.WTRS','Goal 6: Clean Water and Sanitation ')
spi_mapper('un_sdg_map','SPI.D3.7.ENRG','Goal 7: Affordable and Clean Energy')
spi_mapper('un_sdg_map','SPI.D3.8.WORK','Goal 8: Decent Work and Economic Growth ')
spi_mapper('un_sdg_map','SPI.D3.9.INDY','Goal 9: Industry, Innovation and Infrastructure ')
spi_mapper('un_sdg_map','SPI.D3.10.NEQL','Goal 10: Reduced Inequality ')
spi_mapper('un_sdg_map','SPI.D3.11.CITY','Goal 11: Sustainable Cities and Communities')
spi_mapper('un_sdg_map','SPI.D3.12.CNSP','Goal 12: Responsible Consumption and Production ')
spi_mapper('un_sdg_map','SPI.D3.13.CLMT','Goal 13: Climate Action')
spi_mapper('un_sdg_map','SPI.D3.14.LFWT','Goal 14: Life Below Water')
spi_mapper('un_sdg_map','SPI.D3.15.LAND','Goal 15: Life on Land')
spi_mapper('un_sdg_map','SPI.D3.16.INST','Goal 16: Peace and Justice Strong Institutions')
spi_mapper('un_sdg_map','SPI.D3.17.PTNS','Goal 17: Partnerships to achieve the Goal ')
Cleaning for the Data Sources Indicators. Data Sources (4 Indicators):
- 4.1_SOCS - Indicator 4.1: censuses and surveys
- 4.2_SOAD - Indicator 4.2: administrative data
- 4.3_SOGS - Indicator 4.3: geospatial data
- 4.4_SOPC - Indicator 4.4: private/citizen generated data
This indicator draws from data collected by the Statistical Performance Indicators team. The following censuses and surveys are considered:
Request_metadata <- GET(url = "https://api.worldbank.org/v2/sources/2/country/all/metadata?per_page=30000&format=json")
Response_metadata <- content(Request_metadata, as = "text", encoding = "UTF-8")
# Parse the JSON content and convert it to a data frame.
JSON_metadata <- jsonlite::fromJSON(Response_metadata, flatten = TRUE) %>%
data.frame()
#do some conversion to produce a dataframe
df <- JSON_metadata$source.concept %>%
data.frame()
#do some more cleaning and conversion
df <- df$variable %>%
data.frame() %>%
rename(iso3c=id) %>%
unnest(metatype) %>%
pivot_wider(
id_cols=iso3c,
names_from=id,
values_from=value
)
census_df <- df %>%
mutate(country=ShortName) %>%
filter(!is.na(IncomeGroup)) %>%
mutate(indicator_date=as.numeric(Latestpopulationcensus))
#Now calculate our SPI score for this indicator
for (i in 2016:2019) {
temp <- census_df %>%
mutate(date=i) %>%
mutate(last_val=str_extract(Latestpopulationcensus, "\\d{4}") )%>%
mutate(last_val=as.numeric(last_val)) %>%
mutate(indicator=case_when(
(date-last_val<=10) & (date-last_val>0) ~ 1,
(date-last_val<=20) & (date-last_val>0) ~ 0.5,
TRUE ~ 0 )
) %>%
ungroup() %>%
select(c('iso3c', 'country', 'date', 'last_val', 'indicator') ) %>%
arrange(date, country)
assign(paste('temp',i,sep="_"), temp)
}
temp <- temp_2019
for (i in 2016:2018) {
temp <- bind_rows(temp, get(paste('temp',i,sep="_")))
}
temp
## # A tibble: 872 x 5
## iso3c country date last_val indicator
## <chr> <chr> <int> <dbl> <dbl>
## 1 AFG Afghanistan 2019 1979 0
## 2 ALB Albania 2019 2011 1
## 3 DZA Algeria 2019 2008 0.5
## 4 ASM American Samoa 2019 2010 1
## 5 AND Andorra 2019 2011 1
## 6 AGO Angola 2019 2014 1
## 7 ATG Antigua and Barbuda 2019 2011 1
## 8 ARG Argentina 2019 2010 1
## 9 ARM Armenia 2019 2011 1
## 10 ABW Aruba 2019 2010 1
## # ... with 862 more rows
cs_fun <- function(data, input_var) {
data <- data
input_var <- input_var
#read in csv file.
cs_df <- read_csv(file = paste(raw_dir, '/4.1_SOCS/', data, ".csv", sep="" )) %>%
group_by(iso3c, country) %>%
rename(input_var = !! input_var) %>%
nest() %>% # The next chunk of code will split our string with the years of the census (i.e. "2000, 2010") in to separate rows. We will then aggregate up.
mutate(
temp_col = map(
data,
~ str_extract_all(.x$input_var, "\\d{4}") %>%
flatten() %>%
map_chr(~return(.x)) %>%
as_tibble()
)
) %>%
unnest(keep_empty = TRUE) %>% # Now we have a database with the observations equal to Country*Census observations. From here we can calculate latest census, etc.
mutate(indicator_date=as.numeric(value))
#Now calculate our SPI score for this indicator
for (i in 2004:2019) {
temp <- cs_df %>%
mutate(date=i) %>%
mutate(recency_indicator=((date>indicator_date)) ) %>% #restrict to censuses that do not occur after reference date
mutate(indicator_date=if_else(recency_indicator==TRUE, indicator_date, as.numeric(NA))) %>%
group_by(iso3c, country, date, database_last_updated, input_var ) %>%
summarise(last_val=max(indicator_date, na.rm=T)) %>%
mutate(indicator=case_when(
(date-last_val<=10) & (date-last_val>0) ~ 1,
(date-last_val<=20) & (date-last_val>0) ~ 0.5,
TRUE ~ 0 )
) %>%
ungroup() %>%
select(c('iso3c', 'country', 'date', 'input_var', 'indicator') ) %>%
arrange(date, country)
assign(paste('temp',i,sep="_"), temp)
}
temp <- temp_2019
for (i in 2004:2018) {
temp <- bind_rows(temp, get(paste('temp',i,sep="_")))
}
temp
}
#Population Censuses
cs1_df <-cs_fun('D4.1.1.CEN.POPU', 'POPU.CENSUS') %>%
rename(RAW.D4.1.1.POPU.CENSUS=input_var,
SPI.D4.1.1.POPU=indicator)
#Agriculture census
cs2_df <-cs_fun('D4.1.2.CEN.AGRI', 'AGRI.CENSUS') %>%
rename(RAW.D4.1.2.AGRI.CENSUS=input_var,
SPI.D4.1.2.AGRI=indicator)
#Business/establishment census
cs3_df <-cs_fun('D4.1.3.CEN.BIZZ', 'BIZZ.CENSUS') %>%
rename(RAW.D4.1.3.BIZZ.CENSUS=input_var,
SPI.D4.1.3.BIZZ=indicator)
#Household Survey on income/ consumption/ expenditure/ budget/ Integrated Survey
cs4_df <-cs_fun('D4.1.4.SVY.HOUS', 'HOUS.SURVEYS') %>%
rename(RAW.D4.1.4.HOUS.SURVEYS=input_var,
SPI.D4.1.4.HOUS=indicator)
#Agriculture survey
cs5_df <-cs_fun('D4.1.5.SVY.AGRI', 'AGRI.SURVEYS') %>%
rename(RAW.D4.1.5.AGRI.SURVEYS=input_var,
SPI.D4.1.5.AGSVY=indicator)
#Labor Force Survey
cs6_df <-cs_fun('D4.1.6.SVY.LABR', 'LABR.SURVEYS') %>%
rename(RAW.D4.1.6.LABR.SURVEYS=input_var,
SPI.D4.1.6.LABR=indicator)
#Health/Demographic survey
cs7_df <-cs_fun('D4.1.7.SVY.HLTH', 'HLTH.SURVEYS') %>%
rename(RAW.D4.1.7.HLTH.SURVEYS=input_var,
SPI.D4.1.7.HLTH=indicator)
#Business/establishment survey
cs8_df <-cs_fun('D4.1.8.SVY.BIZZ', 'BIZZ.SURVEYS') %>%
rename(RAW.D4.1.8.BIZZ.SURVEYS=input_var,
SPI.D4.1.8.BZSVY=indicator)
#brind all censuses and surveys together
cs_df <- cs1_df %>%
left_join(cs2_df) %>%
left_join(cs3_df) %>%
left_join(cs4_df) %>%
left_join(cs5_df) %>%
left_join(cs6_df) %>%
left_join(cs7_df) %>%
left_join(cs8_df)
#add to spi databases
spi_df <- spi_df %>%
left_join(cs_df)
#now do the figures
cs_df_map <- cs_df %>% filter(!is.na(SPI.D4.1.1.POPU))
spi_mapper('cs_df_map', 'SPI.D4.1.1.POPU', 'Population Census Available in Past 20 Years in 2019' )
cs_df_map <- cs_df %>% filter(!is.na(SPI.D4.1.2.AGRI))
spi_mapper('cs_df_map', 'SPI.D4.1.2.AGRI', 'Agriculture Census Available in Past 20 Years in 2019' )
cs_df_map <- cs_df %>% filter(!is.na(SPI.D4.1.3.BIZZ))
spi_mapper('cs_df_map', 'SPI.D4.1.3.BIZZ', 'Business/Establishment Census Available in Past 20 Years in 2019' )
cs_df_map <- cs_df %>% filter(!is.na(SPI.D4.1.4.HOUS))
spi_mapper('cs_df_map', 'SPI.D4.1.4.HOUS', 'Household Survey Available in Past 20 Years in 2019' )
cs_df_map <- cs_df %>% filter(!is.na(SPI.D4.1.5.AGSVY))
spi_mapper('cs_df_map', 'SPI.D4.1.5.AGSVY', 'Agriculture Survey Available in Past 20 Years in 2019' )
cs_df_map <- cs_df %>% filter(!is.na(SPI.D4.1.6.LABR))
spi_mapper('cs_df_map', 'SPI.D4.1.6.LABR', 'Labor Force Survey Available in Past 20 Years in 2019' )
cs_df_map <- cs_df %>% filter(!is.na(SPI.D4.1.7.HLTH))
spi_mapper('cs_df_map', 'SPI.D4.1.7.HLTH', 'Health Survey Available in Past 20 Years in 2019' )
cs_df_map <- cs_df %>% filter(!is.na(SPI.D4.1.8.BZSVY))
spi_mapper('cs_df_map', 'SPI.D4.1.8.BZSVY', 'Business Survey Available in Past 20 Years in 2019' )
The following indicator checks whether administrative data is available for the following topic areas: Social Protection, Education, Population & Health, and Labor
UNESCO UIS has shared a database containing data we can use on administrative data systems
We score countries based on whether there is:
Data tracking SDG 4.1.2 Administration of a nationally-representative learning assessment (a) in Grade 2 or 3; (b) at the end of primary education; and (c) at the end of lower secondary education
Data tracking whether countries produce the following from administrative data sources:
In order to score this indicator, we calculate the fraction of the three types of indicators that were produced in a country in a year.
#read data from UNESCO API
#get list of indicators to download
indicators <- 'ADMI_GRADE2OR3PRIM_READ+ADMI_ENDOFPRIM_READ+ADMI_ENDOFLOWERSEC_READ+ADMI_GRADE2OR3PRIM_MAT+ADMI_ENDOFPRIM_MAT+ADMI_ENDOFLOWERSEC_MAT+ROFST_1_CP+ROFST_1_F_CP+ROFST_1_M_CP+ROFST_2_CP+ROFST_2_F_CP+ROFST_2_M_CP+ROFST_3_CP+ROFST_3_F_CP+ROFST_3_M_CP+ROFST_1T2_CP+ROFST_1T2_F_CP+ROFST_1T2_M_CP+ROFST_2T3_CP+ROFST_2T3_F_CP+ROFST_2T3_M_CP+ROFST_1T3_CP+ROFST_1T3_F_CP+ROFST_1T3_M_CP+NERA_AGM1_CP+NERA_AGM1_F_CP+NERA_AGM1_M_CP.'
#get list of countries
countries <- 'AFG+ALB+DZA+ASM+AND+AGO+AIA+ATG+ARG+ARM+ABW+AUS+AUT+AZE+BHS+BHR+BGD+BRB+BLR+BEL+BLZ+BEN+BMU+BTN+BOL+BIH+BWA+BRA+VGB+BRN+BGR+BFA+BDI+KHM+CMR+CAN+CPV+CYM+CAF+TCD+CHL+CHN+HKG+MAC+COL+COM+COG+COK+CRI+CIV+HRV+CUB+CUW+CYP+CZE+PRK+COD+DNK+DJI+DMA+DOM+ECU+EGY+SLV+GNQ+ERI+EST+SWZ+ETH+FJI+FIN+FRA+GUF+PYF+GAB+GMB+GEO+DEU+GHA+GIB+GRC+GRL+GRD+GLP+GUM+GTM+GIN+GNB+GUY+HTI+VAT+HND+HUN+ISL+IND+IDN+IRN+IRQ+IRL+ISR+ITA+JAM+JPN+JOR+KAZ+KEN+KIR+KWT+KGZ+LAO+LVA+LBN+LSO+LBR+LBY+LIE+LTU+LUX+MDG+MWI+MYS+MDV+MLI+MLT+MHL+MTQ+MRT+MUS+MEX+FSM+MCO+MNG+MNE+MSR+MAR+MOZ+MMR+NAM+NRU+NPL+NLD+ANT+NCL+NZL+NIC+NER+NGA+NIU+MKD+NOR+OMN+PAK+PLW+PSE+PAN+PNG+PRY+PER+PHL+POL+PRT+PRI+QAT+KOR+MDA+REU+ROU+RUS+RWA+SHN+KNA+LCA+SPM+VCT+WSM+SMR+STP+SAU+SEN+SRB+SYC+SLE+SGP+SXM+SVK+SVN+SLB+SOM+ZAF+SSD+ESP+LKA+SDN+XDN+SUR+SWE+CHE+SYR+TJK+THA+TLS+TGO+TKL+TON+TTO+TUN+TUR+TKM+TCA+TUV+UGA+UKR+ARE+GBR+TZA+USA+URY+UZB+VUT+VEN+VNM+WLF+YEM+ZMB+ZWE+SDG+40675+40330+40334+40344+40606+40617+40603+40614+40618+40612+40616+40619+40611+40613+40630+40650+40651+40656+40642+40620+40640+UIS+40510+40525+40530+40505+40515+40520+40500+40535+40540+40550+WB+40044+40042+40030+40043+40041/all'
start <- '2004'
end <- '2019'
#build the API query
url <- paste('http://data.uis.unesco.org/RestSDMX/sdmx.ashx/GetData/SDG4_DS/',indicators, countries,'?startTime=',start,'&endTime=',end,sep="")
#read the data
unesco_df_raw <- readsdmx::read_sdmx(url)
#cleaning of unesco data
unesco_df <- unesco_df_raw %>%
mutate(date=as.numeric(Time)) %>%
mutate(type = case_when( #categorize based on whether they are assessment, out of school, or participation data
grepl('ADMI', SDG_IND) ~ "NAS", #Administration of a nationally-representative learning assessment (a) in Grade 2 or 3; (b) at the end of primary education; and (c) at the end of lower secondary education
grepl('ROFST', SDG_IND) ~ "OOS", # Out-of-school rate by school age and sex
grepl('NERA', SDG_IND) ~ "ORGL" # Participation rate in organized learning (one year before the official primary entry age), by sex
)) %>%
mutate(edu_admin_available=(ObsValue!=NaN & !is.na(ObsValue))) %>% #get a variable for wehther admin based value exists for indicator
group_by( LOCATION2017,type, date) %>%
summarise(edu_admin_available=mean(as.numeric(edu_admin_available))) %>% #c
ungroup() %>%
pivot_wider(
names_from='type',
names_glue="RAW.D4.2.2.EDU.{type}",
values_from='edu_admin_available',
values_fill=0
) %>%
rename(iso3c=LOCATION2017) %>%
right_join(spi_df_empty) %>%
mutate(across(starts_with('RAW.D4.2.2.EDU'),~if_else(is.na(.),0,.))) %>%
mutate(SPI.D4.2.2.EDU=(RAW.D4.2.2.EDU.NAS+RAW.D4.2.2.EDU.OOS+RAW.D4.2.2.EDU.ORGL)/3) %>%
select(c('iso3c', 'country', 'date', 'RAW.D4.2.2.EDU.NAS','RAW.D4.2.2.EDU.OOS','RAW.D4.2.2.EDU.ORGL', 'SPI.D4.2.2.EDU') )
#add to spi databases
spi_df <- spi_df %>%
left_join(unesco_df)
unesco_map <- unesco_df
spi_mapper('unesco_map', 'SPI.D4.2.2.EDU', 'Education Administrative Data Available According to UIS' )
Civil registration is the act of recording and documenting of vital events in a person’s life (including birth, marriage, divorce, adoption, and death and cause of death) and is a fundamental function of national governments. Birth registration establishes an individual’s legal identity at birth. A legal identity, name, nationality, and proof of age, are important human rights. They enable individuals to be included in various government, social and private services, and include the right to vote, etc. Vital statistics are compiled using civil registration information on these vital events. The availability of reliable and up-to-date vital statistics depends on the level of development of civil registration programs. An effective civil registration and vital statistics (CRVS) system is critical for planning and monitoring programs across several sectors.
By complete, that is representing 90 per cent or more of the events occurring in the specified year.
Source: World Bank WDI Metadata.
#get data directly from WDI metadata
Request_metadata <- GET(url = "https://api.worldbank.org/v2/sources/2/country/all/metadata?per_page=30000&format=json")
Response_metadata <- content(Request_metadata, as = "text", encoding = "UTF-8")
# Parse the JSON content and convert it to a data frame.
JSON_metadata <- jsonlite::fromJSON(Response_metadata, flatten = TRUE) %>%
data.frame()
#do some conversion to produce a dataframe
df <- JSON_metadata$source.concept %>%
data.frame()
#do some more cleaning and conversion
df <- df$variable %>%
data.frame() %>%
rename(iso3c=id) %>%
unnest(metatype) %>%
pivot_wider(
id_cols=iso3c,
names_from=id,
values_from=value
)
# Manipulate and clean final data
df <- df %>%
filter(!is.na(IncomeGroup)) #keep just countries (drop aggregations)
df$SPI.D4.2.3.CRVS<-as.numeric(grepl("Yes", df$Vitalregistrationcomplete)) #create 0,1 variable for vital registration
crvs_df <- df %>%
mutate(date=2019,
country=TableName,
RAW.D4.2.3.CRVS=Vitalregistrationcomplete) %>%
select(c('iso3c', 'country', 'date', 'RAW.D4.2.3.CRVS', 'SPI.D4.2.3.CRVS') )
#pull old data from SCI
Request_metadata <- GET(url = "http://api.worldbank.org/v2/country/all/indicator/3.11.01.03.popreg?format=json&date=2004:2018&per_page=5000")
Response_metadata <- content(Request_metadata, as = "text", encoding = "UTF-8")
# Parse the JSON content and convert it to a data frame.
crvs_df2 <- jsonlite::fromJSON(Response_metadata, flatten = TRUE) %>%
data.frame() %>%
transmute(
iso3c=countryiso3code,
country=country.value,
date=as.numeric(date),
crvs=value
) %>%
mutate(SPI.D4.2.3.CRVS=as.numeric(crvs),
RAW.D4.2.3.CRVS=as.character(crvs)) %>%
left_join(spi_df_empty) #add on country metadata
# Manipulate and clean final data
crvs_df2 <- crvs_df2 %>%
filter(!is.na(income)) %>% #keep just countries (drop aggregations)
select(c('iso3c', 'country', 'date', 'RAW.D4.2.3.CRVS', 'SPI.D4.2.3.CRVS') )
#join updated data with SCI data
crvs_df <- crvs_df %>%
bind_rows(crvs_df2)
#add to spi databases
spi_df <- spi_df %>%
left_join(crvs_df)
crvs_map <- crvs_df
spi_mapper('crvs_map', 'SPI.D4.2.3.CRVS', 'Complete Civil Registration & Vital Statistics Data Available According to WDI' )
ILO shared database containing all administrative data sources used to produce ILO statistics (ILOSTAT)
Database contains the following admin sources:
Database extends from 2010 to 2019 and covers 177 countries.
To produce a score, we calculate the percentage of these six admin sources above that are available for a country in a given year.
#read data
ilo_df_raw <- read_csv(paste(raw_dir, '4.2_SOAD','ILOSTAT_sources.csv', sep="/"))
#cleaning of aspire data
ilo_df <- ilo_df_raw %>%
filter(!grepl('legislation',Source)) %>% # filter out labor legislation, which is included but not reflective of needs.
group_by(iso3c, date) %>%
summarise(ILO.Source.type=last(Source.type)) %>%
left_join(country_metadata)
#apply carryforward to fill in missing values
ilo_df <- ilo_df %>%
right_join(spi_df_empty) %>%
arrange(iso3c, date) %>%
group_by(iso3c) %>%
mutate(ILO.Source.type.filled=if_else(!is.na(ILO.Source.type),1, as.numeric(NA))) %>%
fill(ILO.Source.type.filled, .direction='down') %>%
mutate(SPI.D4.2.4.LBR=if_else(is.na(ILO.Source.type.filled),0, ILO.Source.type.filled),
RAW.D4.2.4.LBR=ILO.Source.type) %>%
select(c('iso3c', 'country', 'date', 'RAW.D4.2.4.LBR', 'SPI.D4.2.4.LBR') )
#add to spi databases
spi_df <- spi_df %>%
left_join(ilo_df)
ilo_map <- ilo_df
spi_mapper('ilo_map', 'SPI.D4.2.4.LBR', 'Labor Administrative Data Available According to ILO' )
New indicator based on references to geospatial data in metadata relating to content on NSO website
Our source for this indicator is Open Data Watch. From Open Data watch:
The Open Data Inventory (ODIN) assesses the coverage and openness of official statistics to help identify gaps, promote open data policies, improve access, and encourage dialogue between national statistical offices (NSOs) and data users. ODIN 2018/19 includes 178 countries, including most all OECD countries. Two-year comparisons are for all countries with two years of data between 2015-2017. Scores can be compared across topics and countries.
We use their indicator on whether indicators are available at the first or second administrative level. To identify the first administrative levels, ODIN largely draws on the ISO 3166-2 standard. In many countries, first administrative levels refer to governorates, regions, or province. No official list exists for the second administrative level classifications. If geographical disaggregation exists that does not qualify as first administrative level, assume that the data are disaggregated to the second administrative level as long as the classification appears to be a further divisions of the first administrative level.
Scoring for the ODIN indicators for geospatial information is below:
There are 21 data categories.
Social Statistics
1. Population and Vital Statistics
2. Education Facilities
3. Education Outcomes
4. Health Facilities
5. Health Outcomes
6. Reproductive Health
7. Gender Statistics
8. Crime and Justice Statistics
9. Poverty Statistics
Economic Statistics
10. National Accounts
11. Labor Statistics
12. Price Indexes
13. Government Finance
14. Money and Banking
15. International Trade
16. Balance of Payments
Environmental Statistics
17. Land Use
18. Resource Use
19. Energy Use
20. Pollution
21. Built Environment
For the first administrative unit: Money & Banking, International Trade, and Balance of Payments are not scored for this element. For various indicators, lenient interpretations are used for first administrative divisions.
For the second administrative unit: Money & Banking, International Trade, Balance of Payments, National Accounts, Government Finance ,Pollution, Energy Use, Price Indexes, and Resource Use are not scored for this element. For various indicators within categories, second administrative level data is not required as well.
In the scores we present below, we show a score between 0 and 1 with a maximum score of 1, which would mean the country has geo data in full for 100% of elements. A score of 0 indicates no data at all for any elements.
More details on the geographic disaggregation considerations can be found in their technical manual:
https://docs.google.com/document/d/1ubPL1l_3im9bjlCVZ6W2ICAy6UAiXl1hGeA1aXImkxI/edit
#read in ODIN data
for (i in 2015:2018) {
temp <- read_csv(paste(raw_dir, '/2.2_DSOA/','ODIN_',i,'.csv', sep="")) %>%
as_tibble(.name_repair = 'universal') %>%
mutate(date=i) %>%
filter(Data.categories=='All Categories')
assign(paste('geo_df',i,sep="_"), temp)
}
#bind different years together
geo_df <- bind_rows(geo_df_2015, geo_df_2016, geo_df_2017, geo_df_2018)
#create geo scores
geo_df <- geo_df %>%
select(Country.Code, date, First.administrative.level, Second.administrative.level) %>%
rename(iso3c=Country.Code) %>%
group_by( iso3c) %>%
right_join(country_metadata) %>%
mutate(D4.3.GEO.first.admin.level=if_else(is.na(First.administrative.level), 0, First.administrative.level),
D4.3.GEO.second.admin.level=if_else(is.na(Second.administrative.level), 0, Second.administrative.level))
#do some quick renaming and formatting
geo_df_temp1 <- geo_df %>%
rename_with(~paste('RAW',., sep="."), .cols=c('D4.3.GEO.first.admin.level', 'D4.3.GEO.second.admin.level') )
geo_df_temp2 <- geo_df %>%
mutate(across(c('D4.3.GEO.first.admin.level', 'D4.3.GEO.second.admin.level'), ~./100)) %>%
rename_with(~paste('SPI',., sep="."), .cols=c('D4.3.GEO.first.admin.level', 'D4.3.GEO.second.admin.level') )
geo_df <- geo_df_temp1 %>%
left_join(geo_df_temp2)
#add to spi df
spi_df <- spi_df %>%
left_join(geo_df)
New indicator based on references to private/citizen generated data in metadata relating to content on NSO website.
We have seen an increased use of private/citizen generated data during the COVID-19 pandemic. One way in which it has been used is for tracking mobility of citizens to understand the social distancing measures citizens are taking. As a measure of private/citizen generated data we make use of Apple and Google Mobility tracking data made publicly available during the pandemic.
Source:
Google LLC “Google COVID-19 Community Mobility Reports”. https://www.google.com/covid19/mobility/ Accessed: 2020-07-23.
Apple “Mobility Trends Reports”. https://www.apple.com/covid19/mobility/ Accessed: 2020-07-23.
#####
# Google
#####
#google mobility
google_df_raw <- read_csv(paste(raw_dir, '4.4_SOPC','Global_Mobility_Report.csv', sep="/"))
#match to world bank country database
google_df <- google_df_raw %>%
group_by(country_region_code) %>%
summarise(freq=n()) %>%
left_join(country_metadata, by=c('country_region_code'='iso2c')) %>%
mutate(date=2020,
value=1)
spi_mapper('google_df', 'value', 'Google Global Mobility Report Data' )
#####
# Apple
#####
apple_df_raw <- read_csv(paste(raw_dir, '4.4_SOPC','applemobilitytrends-2020-07-26.csv', sep="/"))
#match to world bank country database
apple_df <- apple_df_raw %>%
select(-c('country', 'alternative_name', 'sub-region')) %>%
filter(geo_type=="country/region") %>%
rename(country=region) %>%
pivot_longer(cols=starts_with("2020"),
names_to='date',
values_to='value') %>%
group_by(country) %>%
summarise(across(everything(), last)) %>%
mutate(country=case_when(
country=="Egypt" ~ "Egypt, Arab Rep.",
country=="Republic of Korea" ~ "Korea, Rep.",
country=="Macao" ~ "Macao SAR, China",
country=="Slovakia" ~ "Slovak Republic",
country=="Taiwan" ~ "Taiwan, China",
TRUE ~ country
)) %>%
group_by(country) %>%
summarise(freq=n()) %>%
left_join(country_metadata) %>%
mutate(date=2020,
value=1)
spi_mapper('apple_df', 'value', 'Apple Mobility Report Data' )
Data Infrastructure (5 Indicators):
- 5.1_DILG - Indicator 5.1: legislation and governance
- 5.2_DISM - Indicator 5.2: standards
- 5.3_DISK - Indicator 5.3: skills
- 5.4_DIPN - Indicator 5.4: partnerships
- 5.5_DIFI - Indicator 5.5: finance
The legislation and governance indicator will be drawn from SDG indicator 17.18.2 (national statistical legislation compliance with UN Fundamental Principles of Official Statistics), existence of National Statistical Council, national statistical strategy generation, national statistical plan. Also include some other legislative aspects that foster good use of statistics eg freedom of information, privacy/transparency, good governance (eg free and fair elections).
This indicator measures whether the national statistical legislation complies with United Nations Fundamental Principles of Statistics (SDG 17.18.2)
Scores is 1 if the country has a national statistical legislation compliant with United Nations Fundamental Principles of Statistics. Scores of 0 or scores with missing values are treated the same (both given a score of zero).
The source is Paris 21 and UNSD. Data accessed using UNSD SDG API on 2020-09-02
D5.1_DILG <- un_pull('SG_STT_FPOS', '2004', '2019') %>%
select(iso3c, date,value) %>%
right_join(spi_df_empty) %>%
mutate(RAW.5.1.DILG=if_else((is.na(value) | value=="NaN"),as.numeric(NA), as.numeric(value)),
SPI.5.1.DILG=if_else((is.na(value) | value=="NaN"),0, as.numeric(value))) %>%
select(-value)
#
# #read in csv file.
# D5.1_DILG2 <- read_csv(file = paste(raw_dir, "5.1_DILG","5.1_DILG.csv", sep="/" )) %>%
# as_tibble(.name_repair = 'universal') %>%
# transmute(
# country=Country,
# date=Year,
# SPI.5.1.DILG=Data.Value
# ) %>%
# mutate(country=case_when(
# country=="Bahamas" ~ "Bahamas, The",
# country=="Bolivia (Plurinational State of)" ~ "Bolivia" ,
# country=="Côte d'Ivoire" ~ "Cote d'Ivoire" ,
# country=="Democratic Republic of the Congo" ~ "Congo, Dem. Rep." ,
# country=="Congo" ~ "Congo, Rep." ,
# country=="Curacao" ~ "Curacao" ,
# country=="Czechia" ~ "Czech Republic" ,
# country=="Egypt" ~ "Egypt, Arab Rep." ,
# country=="Micronesia (Federated States of)" ~ "Micronesia, Fed. Sts." ,
# country=="United Kingdom" ~ "United Kingdom" ,
# country=="Gambia" ~ "Gambia, The" ,
# country=="Iran (Islamic Republic of)" ~ "Iran, Islamic Rep." ,
# country=="Kyrgyzstan" ~ "Kyrgyz Republic" ,
# country=="Republic of Korea" ~ "Korea, Rep." ,
# country=="Lao People's Democratic Republic" ~ "Lao PDR" ,
# country=="Saint Kitts and Nevis " ~ "St. Kitts and Nevis",
# country=="Saint Lucia" ~ "St. Lucia",
# country=="Republic of Moldova" ~ "Moldova" ,
# country=="Democratic People's Republic of Korea" ~ "Korea, Dem. People’s Rep." ,
# country=="Slovakia" ~ "Slovak Republic" ,
# country=="United Republic of Tanzania" ~ "Tanzania" ,
# country=="United States of America" ~ "United States" ,
# country=="Saint Vincent and the Grenadines" ~ "St. Vincent and the Grenadines" ,
# country=="Venezuela (Bolivarian Republic of)" ~ "Venezuela, RB" ,
# country=="British Virgin Islands" ~ "British Virgin Islands" ,
# country=="Viet Nam" ~ "Vietnam" ,
# country=="Yemen" ~ "Yemen, Rep.",
# country=="United Kingdom of Great Britain and Northern Ireland" ~ "United Kingdom",
# TRUE ~ country
# )) %>%
# mutate(RAW.5.1.DILG=if_else(SPI.5.1.DILG=='no data',as.character(NA), SPI.5.1.DILG),
# SPI.5.1.DILG=if_else(SPI.5.1.DILG=='no data',as.numeric(NA), as.numeric(SPI.5.1.DILG))) %>%
# right_join(spi_df_empty)
#add to spi databases
spi_df <- spi_df %>%
left_join(D5.1_DILG)
D5.1_DILG_map <- D5.1_DILG %>%
filter(!is.na(RAW.5.1.DILG))
spi_mapper('D5.1_DILG_map', 'SPI.5.1.DILG', 'National statistical legislation complies with United Nations Fundamental Principles of Statistics. Source: Paris21')
## Indicator 5.2: standards
The national accounts data are compiled using the concepts, definitions, framework, and methodology of the System of National Account 2008 (SNA2008) or European System of National and Regional Accounts (ESA 2010). The manual has evolved to meet the changing economic structure, to follow systematic accounting and ensure international compatibility.
Scoring: 1 point for using SNA2008 or ESA 2010, 0.5 points for using SNA 1993 or ESA 1995, 0 points otherwise
#read in csv file.
D5.2.1.SNAU <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.1.SNAU.csv", sep="/" )) %>%
mutate(SPI.D5.2.1.SNAU=case_when(
SNAU=="SNA 2008" ~ 1,
SNAU=="SNA 1993" ~ 0.5,
TRUE ~ 0
),
RAW.D5.2.1.SNAU=SNAU) %>%
arrange(date, country) %>%
select(iso3c, country, date,RAW.D5.2.1.SNAU, SPI.D5.2.1.SNAU )
National accounts base year is the year used as the base period for constant price calculations in the country’s national accounts. It is recommended that the base year of constant price estimates be changed periodically to reflect changes in economic structure and relative prices.
1 point for chained price, 0.5 for reference period within past 10 years, 0 points otherwise.
#read in csv file.
D5.2.2.NABY <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.2.NABY.csv", sep="/" )) %>%
mutate(SPI.D5.2.2.NABY=case_when(
NABY=="Original chained constant price data are rescaled." ~ 1,
(date-as.numeric(NABY))<=10 ~ 0.5, #within 10 years of reference period
TRUE ~ 0
),
RAW.D5.2.2.NABY=NABY) %>%
select(iso3c, country, date, RAW.D5.2.2.NABY, SPI.D5.2.2.NABY ) %>%
arrange(date, country)
The industrial production data are compiled using the International Standard Industrial Classification of All Economic Activities (ISIC) Rev.4 and Statistical Classification of Economic Activities in the European Community (NACE) Rev.2. ISIC Rev.4 is a standard classification of economic activities arranged so that entities can be classified per the activity they carry out using criteria such as input, output and use of the products produced, more emphasis has been given to the character of the production process in defining and delineating ISIC classes for international comparability. The manual and classification have changed to cover the complete scope of industrial production, employment, and GDP and other statistical areas.
1 Point. Latest version is adopted (ISIC Rev 4, NACE Rev 2 or a compatible classification)
0.5 Points. Previous version is used (ISIC Rev 3, NACE Rev 1 or a compatible classification)
0 Points. Otherwise
#read in csv file.
D5.2.3.CNIN <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.3.CNIN.csv", sep="/" )) %>%
mutate(SPI.D5.2.3.CNIN=case_when(
str_to_lower(CNIN)=="nace rev2" | str_to_lower(CNIN)=="rev4" ~ 1,
str_to_lower(CNIN)=="nace rev1" | str_to_lower(CNIN)=="rev3" ~ 0.5,
TRUE ~ 0
),
RAW.D5.2.3.CNIN=CNIN) %>%
select(iso3c, country, date, RAW.D5.2.3.CNIN, SPI.D5.2.3.CNIN ) %>%
arrange(date, country)
Consumer Price Index serves as indicators of inflation and reflects changes in the cost of acquiring a fixed basket of goods and services by the average consumer.
Weights are usually derived from consumer expenditure surveys and the CPI base year refers to the year the weights were derived. It is recommended that the base year be changed periodically to reflect changes in expenditure structure.
1 Point. Annual chain linking. 0.5 Points. Base year in last 10 years. 0 points. Otherwise
#read in csv file.
D5.2.4.CPIBY <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.4.CPIBY.csv", sep="/" )) %>%
mutate(SPI.D5.2.4.CPIBY=case_when(
CPIBY=="annual chained" ~ 1,
(date-as.numeric(CPIBY))<=10 ~ 0.5, #within 10 years of reference period
TRUE ~ 0
),
RAW.D5.2.4.CPIBY=CPIBY) %>%
select(iso3c, country, date, RAW.D5.2.4.CPIBY, SPI.D5.2.4.CPIBY ) %>%
arrange(date, country)
Classification of Individual Consumption According to Purpose (COICOP) is used in household budget surveys, consumer price indices and international comparisons of gross domestic product (GDP) and its component expenditures.
Although COICOP is not strictly linked to any particular model of consumer behavior, the classification is designed to broadly reflect differences in income elasticities. It is an integral part of the SNA1993 and more detailed subdivision of the classes provide comparability between countries and between statistics in these different areas.
1 Point. Follow Classification of Individual Consumption by Purpose (COICOP) 0 Points. Otherwise
#read in csv file.
D5.2.5.HOUS <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.5.HOUS.csv", sep="/" )) %>%
mutate(SPI.D5.2.5.HOUS=case_when(
HOUS=="COICOP" ~ 1,
TRUE ~ 0
),
RAW.D5.2.5.HOUS=HOUS) %>%
select(iso3c, country, date, RAW.D5.2.5.HOUS, SPI.D5.2.5.HOUS ) %>%
arrange(date, country)
Classification of status of employment refers to employment data that are compiled using the current international standard International Classification of Status in Employment (ISCE-93). It classifies jobs with respect to the type of explicit or implicit contract of employment between the job holder and the economic unit in which he or she is employed. Therefore, it aims to provide the basis for production of internationally comparable statistics on the employment relationship, including the distinction between salaried employment and self-employment.
1 Point. Follow International Labour Organization, International Classification of Status in Employment (ICSE-93) or 2012 North American Industry Classification System (NAICS). 0 Points Otherwise.
#read in csv file.
D5.2.6.EMPL <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.6.EMPL.csv", sep="/" )) %>%
mutate(SPI.D5.2.6.EMPL=case_when(
EMPL=="ICSE-93" | EMPL=="NAICS" ~ 1,
TRUE ~ 0
),
RAW.D5.2.6.EMPL=EMPL) %>%
select(iso3c, country, date, RAW.D5.2.6.EMPL, SPI.D5.2.6.EMPL ) %>%
arrange(date, country)
Government finance accounting status refers to the accounting basis for reporting central government financial data. For many countries’ government finance data, have been consolidated into one set of accounts capturing all the central government’s fiscal activities and following noncash recording basis.
Budgetary central government accounts do not necessarily include all central government units, the picture they provide of central government activities is usually incomplete.
1 Point. Consolidated central government accounting follows noncash recording basis
0.5 Points. Consolidated central government accounting follows cash recording basis
0 Points. Otherwise
#read in csv file.
D5.2.7.CGOV <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.7.CGOV.csv", sep="/" )) %>%
mutate(SPI.D5.2.7.CGOV=case_when(
CGOV=="AC" ~ 1,
CGOV=="CA" ~ 0.5,
TRUE ~ 0
),
RAW.D5.2.7.CGOV=CGOV) %>%
select(iso3c, country, date, RAW.D5.2.7.CGOV, SPI.D5.2.7.CGOV ) %>%
arrange(date, country)
(GFSM) in use for compiling the data. It provides guidelines on the institutional structure of governments and the presentation of fiscal data in a format similar to business accounting with a balance sheet and income statement plus guidelines on the treatment of exchange rate and other valuation adjustments. The latest manual GFSM2014 is harmonized with the SNA2008.
1 Point. Follow the latest Government Finance Statistical Manual (2014)/ ESA2010
0.5 Points. Previous version is used (GFSM 2001)
0 Points. Otherwise
#read in csv file.
D5.2.8.FINA <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.8.FINA.csv", sep="/" )) %>%
mutate(SPI.D5.2.8.FINA=case_when(
FINA=="2014" | FINA=="ESA 2010"~ 1,
FINA=="2001" ~ 0.5,
TRUE ~ 0
),
RAW.D5.2.8.FINA=FINA) %>%
select(iso3c, country, date, RAW.D5.2.8.FINA, SPI.D5.2.8.FINA ) %>%
arrange(date, country)
Compilation of monetary and financial statistics refers to the Monetary and Financial Statistics Manual (MFSM) in use. It covers concepts, definitions, classifications of financial instruments and sectors, and accounting rules, and provides a comprehensive analytic framework for monetary and financial planning and policy determination. The Monetary and Finance Statistics: Compilation Guide (2008) provides detailed guidelines for the compilation of monetary and financial statistics in addition to MFSM.
1 Point. Follow the latest Monetary and Finance Statistics Manual (2000) or Monetary and Finance Statistics: Compilation Guide (2008/2016) 0 Points. Otherwise
#read in csv file.
D5.2.9.MONY <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.9.MONY.csv", sep="/" )) %>%
mutate(SPI.D5.2.9.MONY=case_when(
MONY=="MFSM 2000" | MONY=="MFSMCG 2016"~ 1,
TRUE ~ 0
),
RAW.D5.2.9.MONY=MONY) %>%
select(iso3c, country, date, RAW.D5.2.9.MONY, SPI.D5.2.9.MONY ) %>%
arrange(date, country)
The Generic Statistical Business Process Model (GSBPM) aims to describe statistics production in a general and process-oriented way. It is used both within and between statistical offices as a common basis for work with statistics production in different ways, such as quality, efficiency, standardization, and process-orientation. It is used for all types of surveys, and “business” is not related to “business statistics” but refers to the statistical office, simply expressed.
1 Point. GSBPM is in use 0 Points. Otherwise
#read in csv file.
D5.2.10.GSBP <- read_csv(file = paste(raw_dir, "5.2_DISM","D5.2.10.GSBP.csv", sep="/" )) %>%
mutate(SPI.D5.2.10.GSBP=case_when(
str_to_lower(GSBP)=="yes" ~ 1,
TRUE ~ 0
),
RAW.D5.2.10.GSBP=GSBP) %>%
select(iso3c, country, date, RAW.D5.2.10.GSBP, SPI.D5.2.10.GSBP ) %>%
arrange(date, country)
# join datasets together based on country and date
D5.2.MSC <- D5.2.1.SNAU %>%
left_join(D5.2.2.NABY) %>%
left_join(D5.2.3.CNIN) %>%
left_join(D5.2.4.CPIBY) %>%
left_join(D5.2.5.HOUS) %>%
left_join(D5.2.6.EMPL) %>%
left_join(D5.2.7.CGOV) %>%
left_join(D5.2.8.FINA) %>%
left_join(D5.2.9.MONY) %>%
left_join(D5.2.10.GSBP) %>%
select(iso3c, country, date, everything())
#Now calculate MSC score which is the average across the 12 indicators
D5.2.MSC <- D5.2.MSC %>%
mutate(SPI.D5.2.MSC=100*rowMeans(.[grep(x=colnames(D5.2.MSC),
pattern="SPI.D5.2")], na.rm=TRUE)) %>%
arrange(date, iso3c)
#add to spi databases
spi_df <- spi_df %>%
left_join(D5.2.MSC)
#now do the figures
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.1.SNAU))
spi_mapper('msc_df_map', 'SPI.D5.2.1.SNAU', 'System of National Accounts in use in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.2.NABY))
spi_mapper('msc_df_map', 'SPI.D5.2.2.NABY', 'National Accounts base year in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.3.CNIN))
spi_mapper('msc_df_map', 'SPI.D5.2.3.CNIN', 'Classification of national industry in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.4.CPIBY))
spi_mapper('msc_df_map', 'SPI.D5.2.4.CPIBY', 'CPI base year in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.5.HOUS))
spi_mapper('msc_df_map', 'SPI.D5.2.5.HOUS', 'Classification of household consumption in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.6.EMPL))
spi_mapper('msc_df_map', 'SPI.D5.2.6.EMPL', 'Classification of status of employment in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.7.CGOV))
spi_mapper('msc_df_map', 'SPI.D5.2.7.CGOV', 'Central government accounting status in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.8.FINA))
spi_mapper('msc_df_map', 'SPI.D5.2.8.FINA', 'Compilation of government finance statistics in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.9.MONY))
spi_mapper('msc_df_map', 'SPI.D5.2.9.MONY', 'Compilation of monetary and financial statistics in 2019' )
msc_df_map <- D5.2.MSC %>% filter(!is.na(RAW.D5.2.10.GSBP))
spi_mapper('msc_df_map', 'SPI.D5.2.10.GSBP', 'Business process in 2019' )
This indicator assesses the systematic use of statistical knowledge with statistical terms and indicators in national policy documents. It is a composite indicator consisting of 4 sub-indices that aim to reflect the relevance of statistical evidence. It comprises four main dimensions: i) Basic consideration; ii) Diagnosis and quantification; iii) Statistical Analysis; iv) Disaggregation.
Score ranges from 0 to 100. The score is the weighted sum of each of the four components’ scores, whose relative weight is reported in brackets: i) Basic Consideration (25%); ii) Diagnosis and quantification (24%); iii) Statistical Analysis (22%); iv) Disaggregation (29%).
The source is Paris 21. Data accessed on August 6, 2020 from https://statisticalcapacitymonitor.org/indicator/127
#read in csv file.
D5.3_DISK <- read_csv(file = paste(raw_dir, "5.3_DISK","5.3_DISK.csv", sep="/" )) %>%
as_tibble(.name_repair = 'universal') %>%
transmute(
country=Country,
date=Year,
SPI.5.3.DISK=Data.Value
) %>%
mutate(country=case_when(
country=="Bahamas" ~ "Bahamas, The",
country=="Bolivia (Plurinational State of)" ~ "Bolivia" ,
country=="Côte d'Ivoire" ~ "Cote d'Ivoire" ,
country=="Democratic Republic of the Congo" ~ "Congo, Dem. Rep." ,
country=="Congo" ~ "Congo, Rep." ,
country=="Curacao" ~ "Curacao" ,
country=="Czechia" ~ "Czech Republic" ,
country=="Egypt" ~ "Egypt, Arab Rep." ,
country=="Micronesia (Federated States of)" ~ "Micronesia, Fed. Sts." ,
country=="United Kingdom" ~ "United Kingdom" ,
country=="Gambia" ~ "Gambia, The" ,
country=="Iran (Islamic Republic of)" ~ "Iran, Islamic Rep." ,
country=="Kyrgyzstan" ~ "Kyrgyz Republic" ,
country=="Republic of Korea" ~ "Korea, Rep." ,
country=="Lao People's Democratic Republic" ~ "Lao PDR" ,
country=="Saint Kitts and Nevis " ~ "St. Kitts and Nevis",
country=="Saint Lucia" ~ "St. Lucia",
country=="Republic of Moldova" ~ "Moldova" ,
country=="Democratic People's Republic of Korea" ~ "Korea, Dem. People’s Rep." ,
country=="Slovakia" ~ "Slovak Republic" ,
country=="United Republic of Tanzania" ~ "Tanzania" ,
country=="United States of America" ~ "United States" ,
country=="Saint Vincent and the Grenadines" ~ "St. Vincent and the Grenadines" ,
country=="Venezuela (Bolivarian Republic of)" ~ "Venezuela, RB" ,
country=="British Virgin Islands" ~ "British Virgin Islands" ,
country=="Viet Nam" ~ "Vietnam" ,
country=="Yemen" ~ "Yemen, Rep.",
country=="United Kingdom of Great Britain and Northern Ireland" ~ "United Kingdom",
TRUE ~ country
)) %>%
mutate(RAW.5.3.DISK=if_else(SPI.5.3.DISK=='no data',as.character(NA), SPI.5.3.DISK),
SPI.5.3.DISK=if_else(SPI.5.3.DISK=='no data',as.numeric(NA), as.numeric(SPI.5.3.DISK))/100) %>%
right_join(spi_df_empty)
#add to spi databases
spi_df <- spi_df %>%
left_join(D5.3_DISK)
D5.3_DISK_map <- D5.3_DISK %>%
filter(!is.na(SPI.5.3.DISK))
spi_mapper('D5.3_DISK_map', 'SPI.5.3.DISK', 'Systematic use of statistical knowledge with statistical terms and indicators in national policy documents. Source: Paris21')
No data is available for this indicator at this point.
Indicator based on SDG indicators (SDG 17.18.3 (national statistical plan that is fully funded and under implementation). This indicator measures whether the national statistical plan under implementation is fully funded. It relates to SDG 17.18.3 and is based on the annual Status Report on National Strategies for the Development of Statistics (NSDS).
Scores is 1 if the country has a national statistical plan that is fully funded and under implementation. Scores of 0 or scores with missing values are treated the same (both given a score of zero).
The source is Paris 21 and UNSD. Data accessed using UNSD SDG API on 2020-09-02
Source of this data is UNSD and Paris21. Downloaded on August 6,2020 from https://statisticalcapacitymonitor.org/indicator/138/
#pull from unsd api
D5.5_DIFI <- un_pull('SG_STT_NSDSFND', '2004', '2019') %>%
select(iso3c, date,value) %>%
right_join(spi_df_empty) %>%
mutate(RAW.5.5.DIFI=if_else((is.na(value) | value=="NaN"),as.numeric(NA), as.numeric(value)),
SPI.5.5.DIFI=if_else((is.na(value) | value=="NaN"),0, as.numeric(value))) %>%
select(-value)
# #read in csv file.
# D5.5_DIFI <- read_csv(file = paste(raw_dir, "5.5_DIFI","D5.5_DIFI.csv", sep="/" )) %>%
# as_tibble(.name_repair = 'universal') %>%
# transmute(
# country=Country,
# date=Year,
# SPI.5.5.DIFI=Data.Value
# ) %>%
# mutate(country=case_when(
# country=="Bahamas" ~ "Bahamas, The",
# country=="Bolivia (Plurinational State of)" ~ "Bolivia" ,
# country=="Côte d'Ivoire" ~ "Cote d'Ivoire" ,
# country=="Democratic Republic of the Congo" ~ "Congo, Dem. Rep." ,
# country=="Congo" ~ "Congo, Rep." ,
# country=="Curacao" ~ "Curacao" ,
# country=="Czechia" ~ "Czech Republic" ,
# country=="Egypt" ~ "Egypt, Arab Rep." ,
# country=="Micronesia (Federated States of)" ~ "Micronesia, Fed. Sts." ,
# country=="United Kingdom" ~ "United Kingdom" ,
# country=="Gambia" ~ "Gambia, The" ,
# country=="Iran (Islamic Republic of)" ~ "Iran, Islamic Rep." ,
# country=="Kyrgyzstan" ~ "Kyrgyz Republic" ,
# country=="Republic of Korea" ~ "Korea, Rep." ,
# country=="Lao People's Democratic Republic" ~ "Lao PDR" ,
# country=="Saint Kitts and Nevis " ~ "St. Kitts and Nevis",
# country=="Saint Lucia" ~ "St. Lucia",
# country=="Republic of Moldova" ~ "Moldova" ,
# country=="Democratic People's Republic of Korea" ~ "Korea, Dem. People’s Rep." ,
# country=="Slovakia" ~ "Slovak Republic" ,
# country=="United Republic of Tanzania" ~ "Tanzania" ,
# country=="United States of America" ~ "United States" ,
# country=="Saint Vincent and the Grenadines" ~ "St. Vincent and the Grenadines" ,
# country=="Venezuela (Bolivarian Republic of)" ~ "Venezuela, RB" ,
# country=="British Virgin Islands" ~ "British Virgin Islands" ,
# country=="Viet Nam" ~ "Vietnam" ,
# country=="Yemen" ~ "Yemen, Rep.",
# country=="United Kingdom of Great Britain and Northern Ireland" ~ "United Kingdom",
# TRUE ~ country
# )) %>%
# mutate(RAW.5.5.DIFI=if_else(SPI.5.5.DIFI=='no data',as.character(NA), SPI.5.5.DIFI),
# SPI.5.5.DIFI=if_else(SPI.5.5.DIFI=='no data',as.numeric(NA), as.numeric(SPI.5.5.DIFI))) %>%
# right_join(spi_df_empty)
#add to spi databases
spi_df <- spi_df %>%
left_join(D5.5_DIFI)
D5.5_DIFI_map <- D5.5_DIFI %>%
filter(!is.na(RAW.5.5.DIFI))
spi_mapper('D5.5_DIFI_map', 'SPI.5.5.DIFI', 'Statistical plan fully funded. Source: Paris21')
#add in population
pop_df <- wbstats::wb(country="all",
indicator='SP.POP.TOTL',
startdate=2004,
enddate=2019) %>%
mutate(date=as.numeric(date)) %>%
mutate(population=value) %>%
select(country, date, population)
# save final dataset
spi_df_final <- spi_df %>%
select(country, iso3c, date, starts_with("SPI"), starts_with("RAW")) %>%
right_join(spi_df_empty) %>%
left_join(pop_df) %>%
arrange(-date,country)
#write to csv
spi_df_final %>%
write_excel_csv( path = paste(output_dir, 'SPI_data.csv', sep="/"))
#label data
#read in metadata
names_df <- data.frame('source_id'=colnames(spi_df_final))
spi_meta <- read_csv(paste(raw_dir, '/metadata/SPI_dimensions_sources.csv', sep="")) %>%
right_join(names_df) %>%
mutate(source_name=if_else(is.na(source_name), paste(source_id), source_name)) %>%
arrange(factor(source_id, levels=colnames(spi_df_final)))
#label data
label(spi_df_final) = as.list(spi_meta$source_name)
#write to stata
spi_df_final %>%
rename_with(~gsub(".","_",.x, fixed=TRUE)) %>%
select(-lat,-long) %>%
write_dta( path = paste(output_dir, 'SPI_data.dta', sep="/"))
#basic analysis
library(corrr)
spi_df_final_cor <- spi_df_final %>%
filter(date==2018) %>%
select(contains("SPI.")) %>%
correlate()
Social Protection Admin Data